1. Get rid of all advertisements and get unlimited access to documents by upgrading to Premium Membership. Upgrade to Premium Now and also get a Premium Badge!

Oracle database

Discussion in 'SQL PL/SQL' started by Rameshkumar, Jun 19, 2017.

  1. Rameshkumar

    Rameshkumar Newly Initiated

    Messages:
    6
    Likes Received:
    0
    Trophy Points:
    30
    Location:
    hong kong
    - There are 3 big files, 1GB (file_a.txt), 10GB (file_b.txt) and 1TB (file_c.txt);
    - The format of these 3 files: each line with a random string in the file;
    - There is only 100MB memory could be used, disk usage is not limited;
    - Assumption:
    IF AND ONLY IF string A appears within all 3 files, we need to count the total appearing times of this A. Such as, A appears 2 times within file_a.txt, appears 10 times within file_b.txt, appears 100 times within file_c.txt, then we count the total appearing times of A as 2 + 10 + 100 = 112 times.

    Question: please write a program to output the strings with TOP 10 and LAST 10 appearing times in descending order.
     
  2. krasnoslobodtsev_si

    krasnoslobodtsev_si Forum Genius

    Messages:
    724
    Likes Received:
    143
    Trophy Points:
    830
    Location:
    Russian Federation
    What is your oracle version ?

    You can use the following algorithm :1) describe an external tables 2) write a general query , in which the intermediate table derived units : the key system , the number of occurrences of the next step and join the tables.

    example pseudo-query for example:
    Code (Text):

    with
     tab_file1 ( select str ,count(*)  cnt from extrn_file1 group by str),
     tab_file2 ( select str ,count(*)  cnt from extrn_file2 group by str),
     tab_file3 ( select str ,count(*) cnt from extrn_file3 group by str),
    pre_result as (
    select
      f1.str,
      f1.cnt+f2.cnt+f3.cnt   as sum_cnt
    from  tab_file1 f1
      join tab_file2 f2.str = f1.str
      join tab_file3 f3.str = f2.str
    ),
    result as (
    select
      pr.str,
      pr.cnt,
      row_number() over(order by  pr.sum_cnt) rn ,
      row_number() over(order by  pr.sum_cnt desc) rn1,
    from pre_result pr
    )
    select
       rs.str,
       rs.cnt
    from result rs
    where (rs.cnt <= 10 or rs.cnt1 <= 10);
     
     
  3. Rameshkumar

    Rameshkumar Newly Initiated

    Messages:
    6
    Likes Received:
    0
    Trophy Points:
    30
    Location:
    hong kong
     
  4. Rameshkumar

    Rameshkumar Newly Initiated

    Messages:
    6
    Likes Received:
    0
    Trophy Points:
    30
    Location:
    hong kong
    Thanks for your reply..

    Why don't use UTL_FILE with Analytical functions..

    Regards,
    Ramesh
     
  5. krasnoslobodtsev_si

    krasnoslobodtsev_si Forum Genius

    Messages:
    724
    Likes Received:
    143
    Trophy Points:
    830
    Location:
    Russian Federation

    Solutions may be different(may be UTL_FILE with pipe-lined function,UTL_FILE and collection[limit clause]) and etc, but when we use external table - there is an advantage : you work with files as tables ,the file I/o is controlled by the DBMS.
    Also, you can use parallel reading, etc.
     
    Last edited: Jun 19, 2017
  6. Rameshkumar

    Rameshkumar Newly Initiated

    Messages:
    6
    Likes Received:
    0
    Trophy Points:
    30
    Location:
    hong kong
     
  7. krasnoslobodtsev_si

    krasnoslobodtsev_si Forum Genius

    Messages:
    724
    Likes Received:
    143
    Trophy Points:
    830
    Location:
    Russian Federation
  8. Rameshkumar

    Rameshkumar Newly Initiated

    Messages:
    6
    Likes Received:
    0
    Trophy Points:
    30
    Location:
    hong kong
    Thank yo so much.
     
  9. Rameshkumar

    Rameshkumar Newly Initiated

    Messages:
    6
    Likes Received:
    0
    Trophy Points:
    30
    Location:
    hong kong
    Thank you so much.