• row_number()
• rank()
• dense_rank()
• lag(value any [, offset integer [, DEFAULT any ]])
• lead(value any [, offset integer [, DEFAULT any ]])
• first_value(value any)
• last_value(value any)
• nth_value(value any, nth integer)
• percent_rank()
• cume_dist()
• ntile(num_buckets integer)
you could find analytic function marked over green posted here,
here let look on remaining functions.
FIRST_VALUE and LAST_VALUE function
Syntax
FIRST_VALUE() OVER ( ) or LAST_VALUE() OVER()
The FIRST_VALUE analytic function picks the first record
from the partition after doing the ORDER BY. Theis computed
on the columns of this first record and results are returned.
The LAST_VALUE function is used in similar context except
that it acts on the last record of the partition. But it not
picking up the right record instead its pick up same data
what salary column picks.
example
don=# select person, dept, salary,first_value(salary)
over(partition by dept order by salary desc)-salary “ salary diff ”
from test;
person | dept | salary | diff by 1st record
-----------+------------+--------+-----------------------
phoebe | accounting | 3700 | 0
lincoln | accounting | 2950 | 750
rachel | accounting | 2950 | 750
benjamin | accounting | 2100 | 1600
michael | accounting | 1700 | 2000
paul | accounting | 1650 | 2050
alexander | it | 3700 | 0
lj | it | 3250 | 450
sara | it | 2600 | 1100
ross | it | 1250 | 2450
theodore | management | 3250 | 0
joey | management | 3250 | 0
monica | management | 2950 | 300
chandler | management | 2400 | 850
fernando | marketing | 1650 | 0
bradley | marketing | 1000 | 650
(16 rows)
in above example first_vlaue() function return first record from
the partition value after order by clause executed.
i.e in accounting partition in would return 3700 ( first after order by desc )
always, from there we could calculate difference in salary from top salary
of the partition.
All about data processing & analytics. Open source database,No-SQL and Hadoop. Discussed issues/solution which I've got during my experience..
Friday, September 24, 2010
Analytic function in postgresql - continue.....
Thursday, September 23, 2010
cluster already running - error while startup postgres postmaster
postmaster daemons create pid files when start up, remove them when the process exits. some time pid file exist even after postmaster daemons exit because the process crashed before it could remove the pid file or the system crashes and processes do not get a chance to remove their pidfiles even if they want to.
--> in this case just find out & remove postmaster pid file ( you could find out pid file in data directory of postgresql installed path)
or
--> find out the the running process id (PID) using port no & kill PID
lsof | grep 5432 --(lsof | grep PortNo) and kill the pid
kill -9 PID
or
--> find out the the running process id (PID) using process name & kill PID
ps -ef | grep postmaster --(lsof | grep PortNo) and kill the pid
kill -9 PID
Once stop the running postmaster using pid, start the postgres database server.
Function for spell out a number
Convert Number into Word in postgresql
for example
SELECT TO_CHAR(TO_DATE(12345,'J'),'JSP') FROM dual;
output like
TWELVE THOUSAND THREE HUNDRED FORTY-FIVE
in postgresql spell out (SP) pattern modifier not implemented (upto version postgresql 9.0)
here i have posted function to convert given number into word, lets look
create or replace function fn_spellout(p_num varchar) returns varchar as
$$
declare
v_word varchar(50);
v_spell varchar(1000) := '';
v_length integer;
v_pos integer := 1;
begin
select length(p_num) into v_length;
for v_pos in 1 .. v_length loop
--while v_pos <= v_length loop
SELECT case substring(p_num,v_pos,1)
when '1' then 'ONE '
when '2' then 'TWO '
when '3' then 'THREE '
when '4' then 'FOUR '
when '5' then 'FIVE '
when '6' then 'SIX '
when '7' then 'SEVEN '
when '8' then 'EIGHT '
when '9' then 'NINE '
when '0' then 'ZERO '
ELSE 'NULL'
end into v_word;
v_spell := v_spell || v_word;
--v_pos := v_pos+1;
end loop;
return v_spell;
end;
$$ language plpgsql
while execute this function it return output like
select fn_spellout('12345')
"ONE TWO THREE FOUR FIVE "
Friday, September 17, 2010
Analytic function in postgresql
Few function in oracle personally i like much more because of simplicity to achieve what we want, one of those function is analytic/windows functions. In postgresql it works 8.4 and later version only.
Key points:
Analytic functions give aggregate result they do not group the result set.
They return the group value multiple times with each record. As such any
other non-"group by" column or expression can be present in the select clause.
Analytic functions are computed after all joins, WHERE clause,
GROUP BY and HAVING are computed on the query. The main ORDER BY clause of the query
operates after the analytic functions, So analytic functions can only appear in the
select list and in the main ORDER BY clause of the query.
The general syntax of analytic function is:
function_name ([expression [, expression ... ]]) OVER ( window_definition )
where window_definition has the syntax
[ existing_window_name ]
[ PARTITION BY expression [, ...] ]
[ ORDER BY expression [ ASC | DESC | USING operator ] [ NULLS { FIRST | LAST } ] [, ...] ]
Let see simple example for each function
1 avg(salary) over (partition by dept)
avg() return (average salary of the partition and) group value multiple time for each record with average salary for partition.
SELECT dept,salary, avg(salary) over (partition by dept) FROM test ORDER BY dept,salary DESC;
dept | salary | avg
------------+--------+-----------------------
accounting | 3700 | 2508.3333333333333333
accounting | 2950 | 2508.3333333333333333
accounting | 2950 | 2508.3333333333333333
accounting | 2100 | 2508.3333333333333333
accounting | 1700 | 2508.3333333333333333
accounting | 1650 | 2508.3333333333333333
it | 3700 | 2700.0000000000000000
it | 3250 | 2700.0000000000000000
it | 2600 | 2700.0000000000000000
it | 1250 | 2700.0000000000000000
management | 3250 | 2962.5000000000000000
management | 3250 | 2962.5000000000000000
management | 2950 | 2962.5000000000000000
management | 2400 | 2962.5000000000000000
marketing | 1650 | 1325.0000000000000000
marketing | 1000 | 1325.0000000000000000
(16 rows)
2) rank() over(partition by dept order by salary desc)
rank() is kind-of like rownum. It returns which position given row gets when ordering by (whatever we put in “order by” clause inside “over ()”. ) . rank() is not really rownum – it’s not unique.
In case of a tie of 2 records at position N, RANK declares 2 positions N and skips position N+1 and gives position N+2 to the next record. While DENSE_RANK declares 2 positions N but does not skip position N+1.
don=# SELECT person,dept,salary, rank() over(partition by dept order by salary desc), dense_rank() over(partition by dept order by salary desc) FROM test ORDER BY dept,salary DESC;
person | dept | salary | rank | dense_rank
-----------+------------+--------+------+------------
phoebe | accounting | 3700 | 1 | 1
lincoln | accounting | 2950 | 2 | 2
rachel | accounting | 2950 | 2 | 2
benjamin | accounting | 2100 | 4 | 3
michael | accounting | 1700 | 5 | 4
paul | accounting | 1650 | 6 | 5
alexander | it | 3700 | 1 | 1
lj | it | 3250 | 2 | 2
sara | it | 2600 | 3 | 3
ross | it | 1250 | 4 | 4
theodore | management | 3250 | 1 | 1
joey | management | 3250 | 1 | 1
monica | management | 2950 | 3 | 2
chandler | management | 2400 | 4 | 3
fernando | marketing | 1650 | 1 | 1
bradley | marketing | 1000 | 2 | 2
(16 rows)
3) row_number() over(partition by deptno order by hiredate desc)
ROW_NUMBER( ) gives a running serial number to a partition of records. It is very useful in reporting, especially in places where different partitions have their own serial numbers.
don=# select dept,salary,row_number() over ( partition by dept order by salary desc) from test;
dept | salary | row_number
------------+--------+------------
accounting | 3700 | 1
accounting | 2950 | 2
accounting | 2950 | 3
accounting | 2100 | 4
accounting | 1700 | 5
accounting | 1650 | 6
it | 3700 | 1
it | 3250 | 2
it | 2600 | 3
it | 1250 | 4
management | 3250 | 1
management | 3250 | 2
management | 2950 | 3
management | 2400 | 4
marketing | 1650 | 1
marketing | 1000 | 2
(16 rows)4)
4) count(salary) over( partition by dept)
same like avg(), but it return count value for each partition
don=# select dept,salary,count(*) over ( partition by dept order by salary desc) from test;
dept | salary | count
------------+--------+-------
accounting | 3700 | 6
accounting | 2950 | 6
accounting | 2950 | 6
accounting | 2100 | 6
accounting | 1700 | 6
accounting | 1650 | 6
it | 3700 | 4
it | 3250 | 4
it | 2600 | 4
it | 1250 | 4
management | 3250 | 4
management | 3250 | 4
management | 2950 | 4
management | 2400 | 4
marketing | 1650 | 2
marketing | 1000 | 2
(16 rows)
5) LEAD and LAG
LEAD has the ability to compute an expression on the next rows (rows which are going to come after the current row) and return the value to the current row. The general syntax of LEAD is shown below:
LEAD (
SELECT deptno, empno, sal,
LEAD(sal, 1, 0) OVER (PARTITION BY dept ORDER BY sal DESC NULLS LAST) NEXT_LOWER_SAL,
LAG(sal, 1, 0) OVER (PARTITION BY dept ORDER BY sal DESC NULLS LAST) PREV_HIGHER_SAL
FROM emp WHERE deptno IN (10, 20) ORDER BY deptno, sal DESC;
DEPTNO EMPNO SAL NEXT_LOWER_SAL PREV_HIGHER_SAL
------- ------ ----- -------------- ---------------
10 7839 5000 2450 0
10 7782 2450 1300 5000
10 7934 1300 0 2450
20 7788 3000 3000 0
20 7902 3000 2975 3000
20 7566 2975 1100 3000
20 7876 1100 800 2975
20 7369 800 0 1100
8 rows selected.
full query
SELECT person, dept, salary, avg(salary) over (partition by dept), rank() over(partition by dept order by salary desc),row_number() over(partition by dept order by salary desc) , count(salary) over( partition by dept ) FROM test ORDER BY dept,salary DESC ;
Thursday, September 16, 2010
Execute procedure in oracle10G EX
let see simple example.............
creating procedure
CREATE OR REPLACE PROCEDURE sp_addVal (var1IN NUMBER, var2 IN NUMBER, var3 OUT NUMBER)
IS
BEGIN
var3:=var1+var2;
END;
executing procedure
DECLARE
v_sum NUMBER;
BEGIN
sp_addVal(50,50,v_sum);
DBMS_OUTPUT.PUT_LINE('output of test is '||v_sum);
END;
[
other then oracle10g EX simply call the procedure like
VARIABLE v_sum NUMBER;
EXEC sp_addVal(50,50,v_sum)
]
output
output of test is 100
Sunday, September 5, 2010
explicit type converstion in postgresql
1) to_timestamp -- used to convert bigint to timestamp with or with out timezone.
2) epoch -- used to convert timestamp to bigint value.
here see the example
1) to_timestamp
SELECT to_timestamp(1283437580); -- it returns the output like
to_timestamp
timestamp with time zone
----------------------
"2010-09-02 19:56:20+05:30"
want to display the timestamp with out timezone
SELECT to_timestamp(1283437580)::timestamp;
to_timestamp
timestamp with time zone
----------------------
"2010-09-02 19:56:20"
2) epoch
SELECT EXTRACT(epoch FROM current_timestamp(0)) -- it return output like
date_part
double precision
-------------------
1283753630
-- more explicit type conversion will update here...
Labels
- #agriculture (1)
- #ai (5)
- #pdf (1)
- Big Data (30)
- blogging (3)
- data analytics (5)
- data science (7)
- Deep Learning (4)
- Hadoop (28)
- Hadoop Eco System (27)
- hdfsCommands (4)
- Hive (5)
- IssueSolution (4)
- jobs (3)
- links (1)
- Machine Learning (4)
- mahout (2)
- MapReduce (1)
- MongoDB (6)
- MySQL (4)
- PlpgSQL (8)
- postgres (6)
- PostgreSQL (53)
- R (20)
- RHadoop (2)
- search keywords (1)
- social (2)
- spark (3)
- twitter (6)
- usecase (2)
- visualization (7)
- weka (1)