9. Dezember 2016

A PL/SQL "JSON explorer" based on the APEX_JSON package

Vor kurzem hat mich ein Kollege um Hilfe beim Parsen eines JSON-Dokumentes gebeten. Dafür gibt es zwar schon einige Beispiele mit dem Package APEX_JSON oder den nativen SQL/JSON Funktionen in Oracle12c. Der Fall lag hier aber etwas anders.

Es geht darum, ein JSON-Dokument zu parsen, dessen Struktur man gar nicht kennt. Egal, ob man das Dokument mit APEX_JSON oder mit den nativen SQL/JSON-Funktionen in Oracle12c parsen möchte - stets muss man wissen, an welchen Attributen man in welchen Stellen des JSON-Dokumentes interessiert ist. Möchte man das JSON jedoch einfach nur "erforschen", so kommt man mit diesen beiden Ansätzen nicht weiter.

Angenommen, es geht darum, das folgende JSON zu parsen und Informationen über alle Knoten zu erhalten:

{
  "string": "value",
  "number": 0,
  "boolean": true,
  "array": [ 1,2,3 ],
  "object": {
     "string": "value",
     "array": [
       {
          "number": 0,
          "string": "value"
       },
       {
          "number": 1,
          "string": "value"
       }
     ],
     "object": { 
        "array": [ "v1", "v2", "v3" ]
     }
  }
}

Man siegt, eine Menge geschachtelter Strukturen. Eine Idee ist es, den in diesem Blog Posting erwähnten Kniff mit APEX_JSON.TO_XMLTYPE zu nutzen: Damit wandelt man das JSON in ein XML-Dokument um ...

select xmlserialize(
  content apex_json.to_xmltype(
    '{ "string": "value", "number": 0, "boolean": true, "array": [ 1,2,3 ], "object": { "string": "value", "array": [ { "number": 0, "string": "value" }, { "number": 1, "string": "value" } ], "object": { "array": [ "v1", "v2", "v3" ] } } }'
  ) indent 
) from dual; 

<?xml version="1.0" encoding="UTF-8"?>
<json>
  <string>value</string>
  <number>0</number>
  <boolean>true</boolean>
  <array>
    <row>1</row>
    <row>2</row>
    <row>3</row>
  </array>
  :

... und die SQL/XML-Funktionen bieten einige Möglichkeiten an, die Struktur zu erforschen ...

with xml_from_json as (
  select apex_json.to_xmltype(
    '{ "string": "value", "number": 0, "boolean": true, "array": [ 1,2,3 ], "object": { "string": "value", "array": [ { "number": 0, "string": "value" }, { "number": 1, "string": "value" } ], "object": { "array": [ "v1", "v2", "v3" ] } } }'
  ) xml from dual
)
select 
  tagname, 
  parentname,
  count(*) occurrences
from xml_from_json, xmltable(
  'for $x in //*
   return <r><t>{$x/name()}</t><p>{$x/parent::*/name()}</p></r>'
  passing xml
  columns
    tagname    varchar2(15) path '/r/t',
    parentname varchar2(15) path '/r/p'
)
group by tagname, parentname

TAGNAME  PARENTNAME  OCCURRENCES  
-------- ----------- ------------
object   object      1            
string   object      1            
string   json        1            
object   json        1            
number   row         2            
json                 1            
row      array       8
:        :           :

Unter Umständen kommt man hiermit schon sehr weit - eine Kleinigkeit ist allerdings beim Umwandeln nach XML verlorengegangen; und das sind die Datentypen der JSON-Attribute. JSON kennt hier immerhin numerische, boolean und String-Datentypen - im XML-Dokument ist alles zum String geworden. Möchte man diese Information behalten, so bleibt noch der programmatische Weg mit dem Paket APEX_JSON. Der nun folgende PL/SQL Code implementiert einen "JSON-Explorer" als Table-Function; man gibt ein JSON-Dokument hinein und bekommt eine Liste mit enthaltenen Elementen, Datentypen und Positionen in der Hierarchie zurück.

drop type json_element_ct
/

drop type json_element_t
/

create type json_element_t as object (
   attr_level       number,
   attr_name        varchar2(255),
   attr_type        varchar2(30),
   attr_value       varchar2(255),
   attr_path        varchar2(255),
   parent_attr_path varchar2(255)
)
/

create type json_element_ct as table of json_element_t
/

create or replace function parse_json( p_json in clob ) return json_element_ct is
   l_parsed_json    apex_json.t_values;

   l_root_children  wwv_flow_t_varchar2;
   l_root_node      apex_json.t_value;

   l_p0             number := 0;
   l_p1             number := 0;
   l_p2             number := 0;
   l_p3             number := 0;
   l_p4             number := 0;

   l_result         json_element_ct := json_element_ct();

   function replace_path( p_path in varchar2 ) return varchar2 is 
   begin
       return regexp_replace(
                 regexp_replace(
                    regexp_replace(
                       regexp_replace(
                          regexp_replace( p_path, '%d', l_p0, 1, 1 ),
                          '%d', l_p1, 1, 1 ),
                       '%d', l_p2, 1, 1 ),  
                    '%d', l_p3, 1, 1 ),
                 '%d', l_p4, 1, 1 );
   end replace_path;

   procedure evaluate_node(
       p_prefix       in varchar2,
       p_node         in apex_json.t_value,
       p_node_name    in varchar2,
       p_parent_name  in varchar2            default null,
       p_level        in number              default 0,
       p_arraylevel   in number              default 1
   ) is
       l_node          apex_json.t_value;
       l_node_children wwv_flow_t_varchar2;

       l_type          varchar2(30);
       l_value         varchar2(255);
   begin
       if p_node.kind = 2 then
           l_type := 'boolean';
           l_value := 'true';

       elsif p_node.kind = 3 then
           l_type := 'boolean';
           l_value := 'false';

       elsif p_node.kind = 4 then
           l_type := 'number';
           l_value := to_char( p_node.number_value );

       elsif p_node.kind = 5 then
           l_type := 'varchar2';
           l_value := p_node.varchar2_value;

       elsif p_node.kind = 7 then
           l_type := 'array';
           l_value := null;

           l_result.extend(1);
           l_result( l_result.count ) := json_element_t ( p_level, nvl( p_node_name, '<no name>'), l_type, l_value, replace_path( p_prefix || p_node_name ), replace_path( p_parent_name ) );

           if p_arraylevel = 1 then l_p0 := 0; end if;
           if p_arraylevel = 2 then l_p1 := 0; end if;
           if p_arraylevel = 3 then l_p2 := 0; end if;
           if p_arraylevel = 4 then l_p3 := 0; end if;
           if p_arraylevel = 5 then l_p4 := 0; end if;

           for j in 1 .. p_node.number_value loop

               if p_arraylevel = 1 then l_p0 := l_p0 + 1; end if;
               if p_arraylevel = 2 then l_p1 := l_p1 + 1; end if;
               if p_arraylevel = 3 then l_p2 := l_p2 + 1; end if;
               if p_arraylevel = 4 then l_p3 := l_p3 + 1; end if;
               if p_arraylevel = 5 then l_p4 := l_p4 + 1; end if;

               l_node := apex_json.get_value( 
                   p_values => l_parsed_json, 
                   p_path =>   p_prefix || p_node_name|| '[%d]', 
                   p0 =>       l_p0, 
                   p1 =>       l_p1, 
                   p2 =>       l_p2, 
                   p3 =>       l_p3, 
                   p4 =>       l_p4 );
 
               evaluate_node( 
                   p_prefix =>      p_prefix || p_node_name || '[%d]', 
                   p_node =>        l_node, 
                   p_node_name =>   null,
                   p_parent_name => p_prefix || p_node_name,
                   p_level =>       p_level + 1, 
                   p_arraylevel =>  p_arraylevel + 1 );

           end loop;

       elsif p_node.kind = 6 then
           l_type := 'object';
           l_value := null;

           l_result.extend(1);
           l_result( l_result.count ) := json_element_t ( p_level, nvl( p_node_name, '<no name>'), l_type, l_value, replace_path( p_prefix || p_node_name ), replace_path( p_parent_name ) );

           l_node_children := apex_json.get_members( 
               p_values => l_parsed_json, 
               p_path =>   p_prefix || p_node_name,  
               p0 =>       l_p0, 
               p1 =>       l_p1, 
               p2 =>       l_p2, 
               p3 =>       l_p3, 
               p4 =>       l_p4 );

           if l_node_children is not null then

               for i in 1 .. l_node_children.count loop

                   l_node := apex_json.get_value( 
                        p_values => l_parsed_json, 
                        p_path =>   p_prefix || p_node_name || '.' || l_node_children( i ), 
                        p0 =>       l_p0, 
                        p1 =>       l_p1, 
                        p2 =>       l_p2, 
                        p3 =>       l_p3, 
                        p4 =>       l_p4 );

                    evaluate_node( 
                        p_prefix =>      p_prefix || p_node_name || '.', 
                        p_node =>        l_node, 
                        p_node_name =>   l_node_children( i ),
                        p_parent_name => p_prefix || p_node_name,
                        p_level =>       p_level + 1, 
                        p_arraylevel =>  p_arraylevel );

                end loop;
            end if;
       end if;

       if p_node.kind in ( 2, 3, 4, 5 ) then 
           l_result.extend( 1 );
           l_result( l_result.count ) := json_element_t ( p_level, nvl( p_node_name, '<no name>'), l_type, l_value, replace_path( p_prefix || p_node_name ), replace_path( p_parent_name ) );
       end if;
  end;
begin
    apex_json.parse( l_parsed_json, p_json );
    l_root_children := apex_json.get_members( p_values => l_parsed_json, p_path => '.' );

    l_result.extend( 1 );
    l_result( l_result.count ) := json_element_t ( null, '<root>', 'object', null, '<root>', null );

    if l_root_children is not null then

         for i in 1 .. l_root_children.count loop

             l_root_node := apex_json.get_value( 
                 p_values => l_parsed_json, 
                 p_path =>   l_root_children( i ) );

             evaluate_node( 
                 p_prefix =>      null,
                 p_node =>        l_root_node, 
                 p_node_name =>   l_root_children( i ),
                 p_parent_name => '<root>' );
         end loop;
    end if;
    return l_result;
end;
/
sho err

Ausprobieren ist dann einfach ...

select * from table(parse_json('{ "string": "value", "number": 0, "boolean": true, "array": [ 1,2,3 ], "object": { "string": "value", "array": [ { "number": 0, "string": "value" }, { "number": 1, "string": "value" } ], "object": { "array": [ "v1", "v2", "v3" ] } } } ' ));

ATTR_LEVEL  ATTR_NAME  ATTR_TYPE  ATTR_VALUE  ATTR_PATH               PARENT_ATTR_PATH     
----------- ---------- ---------- ----------- ----------------------- -------------------
            <root>     object                 <root>                                       
0           string     varchar2   value       string                  <root>               
0           number     number     0           number                  <root>               
0           boolean    boolean    true        boolean                 <root>               
0           array      array                  array                   <root>               
1           <no name>  number     1           array[1]                array                
1           <no name>  number     2           array[2]                array                
1           <no name>  number     3           array[3]                array                
0           object     object                 object                  <root>               
1           string     varchar2   value       object.string           object               
1           array      array                  object.array            object               
2           <no name>  object                 object.array[1]         object.array         
3           number     number     0           object.array[1].number  object.array[1]      
3           string     varchar2   value       object.array[1].string  object.array[1]      
:           :          :          :           :                       : 

Wer mag, kann das ganze mit hierarischem SQL ein wenig aufbereiten ...

select lpad(' ', level * 4 - 1, ' ') || attr_name, attr_type, attr_value, attr_path
from data
start with parent_attr_path is null 
connect by parent_attr_path = prior attr_path
/

ATTR_NAME                      ATTR_TYPE  ATTR_VALUE  ATTR_PATH  
------------------------------ ---------- ----------- ------------------------              
<root>                         object                 <root>                  
    array                      array                  array                   
        <no name>              number     1           array[1]                
        <no name>              number     2           array[2]                
        <no name>              number     3           array[3]                
    boolean                    boolean    true        boolean                 
    number                     number     0           number                  
    object                     object                 object                  
        array                  array                  object.array            
            <no name>          object                 object.array[1]         
                number         number     0           object.array[1].number 
:                              :          :           : 

Fertig; viel Spaß beim Ausprobieren. Die Funktion arbeitet sich durch die ganze Hierarchie; allerdings gilt für Arrays, dass die Schachtelungstiefe maximal fünf sein darf. In das sechste, geschachtelte (!) Array kann die Funktion nicht einsteigen, da APEX_JSON hier auf fünf Ebenen begrenzt ist. Für die meisten Anforderungen sollte das aber genug sein. Außerdem habe ich den Fall, dass das JSON-Dokument bereits auf Root-Ebene ein Array ist, nicht berücksichtigt; es muss ein Objekt sein (das erste Zeichen im JSON muss die geöffnete, geschweifte Klammer "{" sein).

Recently, I helped colleague in how to parse a JSON document in PL/SQL - that was a very interesting case, which I did not blog about so far. I already have blog postings about parsing JSON with SQL or PL/SQL: JSON parsing with the APEX_JSON package or JSON parsing using the Oracle12c native SQL functions. But this case was different.

The requirement was to explore a JSON document - it's not about looking up a very specific value; we want to know which attributes and values are in the document. Assumed, we have the following JSON document and we want to explore it programmatically - what's in the document?

{
  "string": "value",
  "number": 0,
  "boolean": true,
  "array": [ 1,2,3 ],
  "object": {
     "string": "value",
     "array": [
       {
          "number": 0,
          "string": "value"
       },
       {
          "number": 1,
          "string": "value"
       }
     ],
     "object": { 
        "array": [ "v1", "v2", "v3" ]
     }
  }
}

We see some nested structures, objects and arrays. One approach would be to use the APEX_JSON.TO_XMLTYPE function as described in the JSON parsing with APEX_JSON blog posting. With that function, we can convert the JSON document to an XMLTYPE ...

select xmlserialize(
  content apex_json.to_xmltype(
    '{ "string": "value", "number": 0, "boolean": true, "array": [ 1,2,3 ], "object": { "string": "value", "array": [ { "number": 0, "string": "value" }, { "number": 1, "string": "value" } ], "object": { "array": [ "v1", "v2", "v3" ] } } }'
  ) indent 
) from dual; 

<?xml version="1.0" encoding="UTF-8"?>
<json>
  <string>value</string>
  <number>0</number>
  <boolean>true</boolean>
  <array>
    <row>1</row>
    <row>2</row>
    <row>3</row>
  </array>
  :

... which allows us to use all the XML DB and SQL/XML functions on it.

with xml_from_json as (
  select apex_json.to_xmltype(
    '{ "string": "value", "number": 0, "boolean": true, "array": [ 1,2,3 ], "object": { "string": "value", "array": [ { "number": 0, "string": "value" }, { "number": 1, "string": "value" } ], "object": { "array": [ "v1", "v2", "v3" ] } } }'
  ) xml from dual
)
select 
  tagname, 
  parentname,
  count(*) occurrences
from xml_from_json, xmltable(
  'for $x in //*
   return <r><t>{$x/name()}</t><p>{$x/parent::*/name()}</p></r>'
  passing xml
  columns
    tagname    varchar2(15) path '/r/t',
    parentname varchar2(15) path '/r/p'
)
group by tagname, parentname

TAGNAME  PARENTNAME  OCCURRENCES  
-------- ----------- ------------
object   object      1            
string   object      1            
string   json        1            
object   json        1            
number   row         2            
json                 1            
row      array       8
:        :           :

That is a nice result; we can see the attribute names, parent names and (if we adjust the SQL query) also the attribute values. But the data types of the JSON attributes are lost: JSON attributes are either numeric, boolean or strings - with the conversion to XMLTYPE they all became strings - and we cannot get that information any more.

The APEX_JSON package provides procedures and functions to explore an unknown JSON document. The following code implements a PL/SQL table function which acts as a "JSON explorer": It parses a JSON and returns all found nodes and attributes together with some context information. Here is the code:

drop type json_element_ct
/

drop type json_element_t
/

create type json_element_t as object (
   attr_level       number,
   attr_name        varchar2(255),
   attr_type        varchar2(30),
   attr_value       varchar2(255),
   attr_path        varchar2(255),
   parent_attr_path varchar2(255)
)
/

create type json_element_ct as table of json_element_t
/

create or replace function parse_json( p_json in clob ) return json_element_ct is
   l_parsed_json    apex_json.t_values;

   l_root_children  wwv_flow_t_varchar2;
   l_root_node      apex_json.t_value;

   l_p0             number := 0;
   l_p1             number := 0;
   l_p2             number := 0;
   l_p3             number := 0;
   l_p4             number := 0;

   l_result         json_element_ct := json_element_ct();

   function replace_path( p_path in varchar2 ) return varchar2 is 
   begin
       return regexp_replace(
                 regexp_replace(
                    regexp_replace(
                       regexp_replace(
                          regexp_replace( p_path, '%d', l_p0, 1, 1 ),
                          '%d', l_p1, 1, 1 ),
                       '%d', l_p2, 1, 1 ),  
                    '%d', l_p3, 1, 1 ),
                 '%d', l_p4, 1, 1 );
   end replace_path;

   procedure evaluate_node(
       p_prefix       in varchar2,
       p_node         in apex_json.t_value,
       p_node_name    in varchar2,
       p_parent_name  in varchar2            default null,
       p_level        in number              default 0,
       p_arraylevel   in number              default 1
   ) is
       l_node          apex_json.t_value;
       l_node_children wwv_flow_t_varchar2;

       l_type          varchar2(30);
       l_value         varchar2(255);
   begin
       if p_node.kind = 2 then
           l_type := 'boolean';
           l_value := 'true';

       elsif p_node.kind = 3 then
           l_type := 'boolean';
           l_value := 'false';

       elsif p_node.kind = 4 then
           l_type := 'number';
           l_value := to_char( p_node.number_value );

       elsif p_node.kind = 5 then
           l_type := 'varchar2';
           l_value := p_node.varchar2_value;

       elsif p_node.kind = 7 then
           l_type := 'array';
           l_value := null;

           l_result.extend(1);
           l_result( l_result.count ) := json_element_t ( p_level, nvl( p_node_name, '<no name>'), l_type, l_value, replace_path( p_prefix || p_node_name ), replace_path( p_parent_name ) );

           if p_arraylevel = 1 then l_p0 := 0; end if;
           if p_arraylevel = 2 then l_p1 := 0; end if;
           if p_arraylevel = 3 then l_p2 := 0; end if;
           if p_arraylevel = 4 then l_p3 := 0; end if;
           if p_arraylevel = 5 then l_p4 := 0; end if;

           for j in 1 .. p_node.number_value loop

               if p_arraylevel = 1 then l_p0 := l_p0 + 1; end if;
               if p_arraylevel = 2 then l_p1 := l_p1 + 1; end if;
               if p_arraylevel = 3 then l_p2 := l_p2 + 1; end if;
               if p_arraylevel = 4 then l_p3 := l_p3 + 1; end if;
               if p_arraylevel = 5 then l_p4 := l_p4 + 1; end if;

               l_node := apex_json.get_value( 
                   p_values => l_parsed_json, 
                   p_path =>   p_prefix || p_node_name|| '[%d]', 
                   p0 =>       l_p0, 
                   p1 =>       l_p1, 
                   p2 =>       l_p2, 
                   p3 =>       l_p3, 
                   p4 =>       l_p4 );
 
               evaluate_node( 
                   p_prefix =>      p_prefix || p_node_name || '[%d]', 
                   p_node =>        l_node, 
                   p_node_name =>   null,
                   p_parent_name => p_prefix || p_node_name,
                   p_level =>       p_level + 1, 
                   p_arraylevel =>  p_arraylevel + 1 );

           end loop;

       elsif p_node.kind = 6 then
           l_type := 'object';
           l_value := null;

           l_result.extend(1);
           l_result( l_result.count ) := json_element_t ( p_level, nvl( p_node_name, '<no name>'), l_type, l_value, replace_path( p_prefix || p_node_name ), replace_path( p_parent_name ) );

           l_node_children := apex_json.get_members( 
               p_values => l_parsed_json, 
               p_path =>   p_prefix || p_node_name,  
               p0 =>       l_p0, 
               p1 =>       l_p1, 
               p2 =>       l_p2, 
               p3 =>       l_p3, 
               p4 =>       l_p4 );

           if l_node_children is not null then

               for i in 1 .. l_node_children.count loop

                   l_node := apex_json.get_value( 
                        p_values => l_parsed_json, 
                        p_path =>   p_prefix || p_node_name || '.' || l_node_children( i ), 
                        p0 =>       l_p0, 
                        p1 =>       l_p1, 
                        p2 =>       l_p2, 
                        p3 =>       l_p3, 
                        p4 =>       l_p4 );

                    evaluate_node( 
                        p_prefix =>      p_prefix || p_node_name || '.', 
                        p_node =>        l_node, 
                        p_node_name =>   l_node_children( i ),
                        p_parent_name => p_prefix || p_node_name,
                        p_level =>       p_level + 1, 
                        p_arraylevel =>  p_arraylevel );

                end loop;
            end if;
       end if;

       if p_node.kind in ( 2, 3, 4, 5 ) then 
           l_result.extend( 1 );
           l_result( l_result.count ) := json_element_t ( p_level, nvl( p_node_name, '<no name>'), l_type, l_value, replace_path( p_prefix || p_node_name ), replace_path( p_parent_name ) );
       end if;
  end;
begin
    apex_json.parse( l_parsed_json, p_json );
    l_root_children := apex_json.get_members( p_values => l_parsed_json, p_path => '.' );

    l_result.extend( 1 );
    l_result( l_result.count ) := json_element_t ( null, '<root>', 'object', null, '<root>', null );

    if l_root_children is not null then

         for i in 1 .. l_root_children.count loop

             l_root_node := apex_json.get_value( 
                 p_values => l_parsed_json, 
                 p_path =>   l_root_children( i ) );

             evaluate_node( 
                 p_prefix =>      null,
                 p_node =>        l_root_node, 
                 p_node_name =>   l_root_children( i ),
                 p_parent_name => '<root>' );
         end loop;
    end if;
    return l_result;
end;
/
sho err

We can test it easily ...

select * from table(parse_json('{ "string": "value", "number": 0, "boolean": true, "array": [ 1,2,3 ], "object": { "string": "value", "array": [ { "number": 0, "string": "value" }, { "number": 1, "string": "value" } ], "object": { "array": [ "v1", "v2", "v3" ] } } } ' ));

ATTR_LEVEL  ATTR_NAME  ATTR_TYPE  ATTR_VALUE  ATTR_PATH               PARENT_ATTR_PATH     
----------- ---------- ---------- ----------- ----------------------- -------------------
            <root>     object                 <root>                                       
0           string     varchar2   value       string                  <root>               
0           number     number     0           number                  <root>               
0           boolean    boolean    true        boolean                 <root>               
0           array      array                  array                   <root>               
1           <no name>  number     1           array[1]                array                
1           <no name>  number     2           array[2]                array                
1           <no name>  number     3           array[3]                array                
0           object     object                 object                  <root>               
1           string     varchar2   value       object.string           object               
1           array      array                  object.array            object               
2           <no name>  object                 object.array[1]         object.array         
3           number     number     0           object.array[1].number  object.array[1]      
3           string     varchar2   value       object.array[1].string  object.array[1]      
:           :          :          :           :                       : 

And with some hierarchical SQL, we can pretty-print the JSON hierarchy.

select lpad(' ', level * 4 - 1, ' ') || attr_name, attr_type, attr_value, attr_path
from data
start with parent_attr_path is null 
connect by parent_attr_path = prior attr_path
/

ATTR_NAME                      ATTR_TYPE  ATTR_VALUE  ATTR_PATH  
------------------------------ ---------- ----------- ------------------------              
<root>                         object                 <root>                  
    array                      array                  array                   
        <no name>              number     1           array[1]                
        <no name>              number     2           array[2]                
        <no name>              number     3           array[3]                
    boolean                    boolean    true        boolean                 
    number                     number     0           number                  
    object                     object                 object                  
        array                  array                  object.array            
            <no name>          object                 object.array[1]         
                number         number     0           object.array[1].number 
:                              :          :           : 

That's it - have fun trying this out. Basically the function walks through the complete JSON hierarchy, but due to a restriction in APEX_JSON, it can only access up to five nested(!) arrays. Also I did not take care for the case that the JSON document is an array at the root level (starting with a "["); it only works for JSON documents which are objects at the root level (starting with a "{").

Beliebte Postings