1. Get rid of all advertisements and get unlimited access to documents by upgrading to Premium Membership. Upgrade to Premium Now and also get a Premium Badge!

Regex to find all adress information

Discussion in 'SQL PL/SQL' started by mr-sansibar, Mar 25, 2014.

  1. mr-sansibar

    mr-sansibar Active Member

    Messages:
    2
    Likes Received:
    0
    Trophy Points:
    55
    I'm not so practiced in regular expression.
    I wanted parse a csv-file in a table. specially the problem is, that I couln't find a regex pattern to match the street.



    File

    Band, Blatt: n.n., 1111 Adresse: Hax-Basalt-Str. 13a 66666 Frankfurt Bundesrepublik Deutschland
    Band, Blatt: n.n., 2222 Adresse: Kirchburg Str. 16 66666 Frankfurt Bundesrepublik Deutschland
    Band, Blatt: n.n., 3333 Adresse: Amselweg 11 666666 Frankfurt Bundesrepublik Deutschland
    Band, Blatt: n.n., 4444 Adresse: Zitronenweg 55 99999 Sankt Augustin Bundesrepublik Deutschland
    Band, Blatt: n.n., 5555Adresse: Auf der Alm 3 11111 Bergheim Bundesrepublik Deutschland


    Target-Table
    Band|ID|Street|Street-Nr|ZIP-Code|City|Country
    Band, Blatt: n.n.|1111|Hax-Basalt-Str.|13a|66666|Frankfurt|Bundesrepublik Deutschland
    Band, Blatt: n.n.|2222|Kirchburg Str.|16|66666|Frankfurt|Bundesrepublik Deutschland
    Band, Blatt: n.n.|3333|Amselweg|11|66666|Frankfurt|Bundesrepublik Deutschland
    Band, Blatt: n.n.|4444|Zitronenweg|55|99999|Sankt Augustin|Bundesrepublik Deutschland
    Band, Blatt: n.n.|5555|Auf der Alm|3|11111|Bergheim|Bundesrepublik Deutschland


    thanks a lot for helping
     
  2. krasnoslobodtsev_si

    krasnoslobodtsev_si Forum Genius

    Messages:
    626
    Likes Received:
    129
    Trophy Points:
    810
    Location:
    Russian Federation
    It is possible to carry out analysis and to SQL, but on large volumes it is better for PL/SQL to use.
    For parsing of the csv file it is better to use: dbms_lob, utl_file - for obtaining all data or a portion for parsing.

    simple example parsing :

    Code (SQL):

    SET serveroutput ON
    DECLARE
     l_clob CLOB := 'Band, Blatt: n.n., 1111 Adresse: Hax-Basalt-Str. 13a 66666 Frankfurt Bundesrepublik Deutschland
    Band, Blatt: n.n., 2222 Adresse: Kirchburg Str. 16 66666 Frankfurt Bundesrepublik Deutschland
    Band, Blatt: n.n., 3333 Adresse: Amselweg 11 666666 Frankfurt Bundesrepublik Deutschland
    Band, Blatt: n.n., 4444 Adresse: Zitronenweg 55 99999 Sankt Augustin Bundesrepublik Deutschland
    Band, Blatt: n.n., 5555Adresse: Auf der Alm 3 11111 Bergheim Bundesrepublik Deutschland'
    ;
     l_str  varchar2(32767);
     l_ii   pls_integer := 1;
     l_ij   pls_integer := 1;
     BEGIN
         loop
             l_str := regexp_substr(l_clob,'^(.)+?$',1,l_ii,'m');
             exit WHEN l_str IS  NULL ;
             l_ii:= l_ii +1;
             dbms_output.put_line(l_str);
       
             l_ij := regexp_count(l_str,'[^,]+');
             dbms_output.put_line('Count elements = '||l_ij);    
             FOR z IN 1 .. l_ij
             loop            
                 dbms_output.put_line('   '||regexp_substr(l_str,'[^,]+',1,z));
             END loop;          
         END loop;    
    END;    
    /

    SQL>
     
    Band, Blatt: n.n., 1111 Adresse: Hax-Basalt-Str. 13a 66666 Frankfurt Bundesrepublik Deutschland
    COUNT elements = 3
       Band
        Blatt: n.n.
        1111 Adresse: Hax-Basalt-Str. 13a 66666 Frankfurt Bundesrepublik Deutschland
    Band, Blatt: n.n., 2222 Adresse: Kirchburg Str. 16 66666 Frankfurt Bundesrepublik Deutschland
    COUNT elements = 3
       Band
        Blatt: n.n.
        2222 Adresse: Kirchburg Str. 16 66666 Frankfurt Bundesrepublik Deutschland
    Band, Blatt: n.n., 3333 Adresse: Amselweg 11 666666 Frankfurt Bundesrepublik Deutschland
    COUNT elements = 3
       Band
        Blatt: n.n.
        3333 Adresse: Amselweg 11 666666 Frankfurt Bundesrepublik Deutschland
    Band, Blatt: n.n., 4444 Adresse: Zitronenweg 55 99999 Sankt Augustin Bundesrepublik Deutschland
    COUNT elements = 3
       Band
        Blatt: n.n.
        4444 Adresse: Zitronenweg 55 99999 Sankt Augustin Bundesrepublik Deutschland
    Band, Blatt: n.n., 5555Adresse: Auf der Alm 3 11111 Bergheim Bundesrepublik Deutschland
    COUNT elements = 3
       Band
        Blatt: n.n.
        5555Adresse: Auf der Alm 3 11111 Bergheim Bundesrepublik Deutschland
     
    PL/SQL PROCEDURE successfully completed
     

     
    Here links which can help :

    http://docs.oracle.com/cd/B28359_01/appdev.111/b28424/adfns_regexp.htm#g1015343

    http://www.oracle-base.com/articles/misc/regular-expressions-support-in-oracle.php

    http://www.oradev.com/regular_expressions_sql_oracle.jsp
     
  3. mr-sansibar

    mr-sansibar Active Member

    Messages:
    2
    Likes Received:
    0
    Trophy Points:
    55
    Thanks a lot Sergey,
    but can you show how i can parse the rest of the string, for example:
    4444 Adresse: Zitronenweg 55 99999 Sankt Augustin Bundesrepublik Deutschland
    in street=Zitronenweg
    street-numer=55
    zip-code=99999
    city=Sankt Augustin
    country=Bundesrepublik Deutschland

    greatings ├╝mit
     
  4. krasnoslobodtsev_si

    krasnoslobodtsev_si Forum Genius

    Messages:
    626
    Likes Received:
    129
    Trophy Points:
    810
    Location:
    Russian Federation
    As in an address line as a separator only gap character also it is impossible to separate the city from the country,
    then it is possible to use manual receiving elements of an address line :


    Code (SQL):

    --SET serveroutput ON
    DECLARE
     l_clob CLOB := 'Band, Blatt: n.n., 1111 Adresse: Hax-Basalt-Str. 13a 66666 Frankfurt Bundesrepublik Deutschland
    Band, Blatt: n.n., 2222 Adresse: Kirchburg Str. 16 66666 Frankfurt Bundesrepublik Deutschland
    Band, Blatt: n.n., 3333 Adresse: Amselweg 11 666666 Frankfurt Bundesrepublik Deutschland
    Band, Blatt: n.n., 4444 Adresse: Zitronenweg 55 99999 Sankt Augustin Bundesrepublik Deutschland
    Band, Blatt: n.n., 5555Adresse: Auf der Alm 3 11111 Bergheim Bundesrepublik Deutschland'
    ;
     l_str  varchar2(32767);
     l_ii   pls_integer := 1;
     l_ij   pls_integer := 1;
     BEGIN
         loop
             l_str := regexp_substr(l_clob,'^(.)+?$',1,l_ii,'m');
             exit WHEN l_str IS NULL ;
             l_ii:= l_ii +1;
             dbms_output.put_line(l_str);
     
             l_ij := regexp_count(l_str,'[^,]+');
             
             
             dbms_output.put_line('Count elements = '||l_ij);    
             
             FOR z IN 1 .. l_ij
             loop            
                 dbms_output.put_line('   '||regexp_substr(l_str,'[^,]+',1,z));                          
             END loop;          
             
             --Here we will sort an address line
             l_str := regexp_substr(l_str,'[^,]+',1,l_ij);
             
             l_ij :=  nvl(regexp_count(l_str,'\S+'),0);
             dbms_output.put_line('      street:'||regexp_substr(l_str,'\S+',1,3));                          
             dbms_output.put_line('      steert number:'||regexp_substr(l_str,'\S+',1,4));
             dbms_output.put_line('      zip code:'||regexp_substr(l_str,'\S+',1,5));
             dbms_output.put_line('      city:'||regexp_substr(l_str,'\S+',1,6));        
             dbms_output.put     ('      country:'||regexp_substr(l_str,'\S+',1,7)||' ');        
             dbms_output.put_line(regexp_substr(l_str,'\S+',1,8));        
             
                 
             
         END loop;    
    END;    
    /


    SQL>
    Band, Blatt: n.n., 1111 Adresse: Hax-Basalt-Str. 13a 66666 Frankfurt Bundesrepublik Deutschland
    COUNT elements = 3
       Band
        Blatt: n.n.
        1111 Adresse: Hax-Basalt-Str. 13a 66666 Frankfurt Bundesrepublik Deutschland
          street:Hax-Basalt-Str.
          steert NUMBER:13a
          zip code:66666
          city:Frankfurt
          country:Bundesrepublik Deutschland
    Band, Blatt: n.n., 2222 Adresse: Kirchburg Str. 16 66666 Frankfurt Bundesrepublik Deutschland
    COUNT elements = 3
       Band
        Blatt: n.n.
        2222 Adresse: Kirchburg Str. 16 66666 Frankfurt Bundesrepublik Deutschland
          street:Kirchburg
          steert NUMBER:Str.
          zip code:16
          city:66666
          country:Frankfurt Bundesrepublik
    Band, Blatt: n.n., 3333 Adresse: Amselweg 11 666666 Frankfurt Bundesrepublik Deutschland
    COUNT elements = 3
       Band
        Blatt: n.n.
        3333 Adresse: Amselweg 11 666666 Frankfurt Bundesrepublik Deutschland
          street:Amselweg
          steert NUMBER:11
          zip code:666666
          city:Frankfurt
          country:Bundesrepublik Deutschland
    Band, Blatt: n.n., 4444 Adresse: Zitronenweg 55 99999 Sankt Augustin Bundesrepublik Deutschland
    COUNT elements = 3
       Band
        Blatt: n.n.
        4444 Adresse: Zitronenweg 55 99999 Sankt Augustin Bundesrepublik Deutschland
          street:Zitronenweg
          steert NUMBER:55
          zip code:99999
          city:Sankt
          country:Augustin Bundesrepublik
    Band, Blatt: n.n., 5555Adresse: Auf der Alm 3 11111 Bergheim Bundesrepublik Deutschland
    COUNT elements = 3
       Band
        Blatt: n.n.
        5555Adresse: Auf der Alm 3 11111 Bergheim Bundesrepublik Deutschland
          street:der
          steert NUMBER:Alm
          zip code:3
          city:11111
          country:Bergheim Bundesrepublik

     


    Pay attention: if to use other separator in an address line, for example: which will single-digit separate components