Just sharing, no big deal!

~~'Distinct' is the same as 'Group By'.~~

'Distinct' may achieve the same result as 'Group By' depending on the data.

T-SQL_1 and T-SQL_2 return the same result

This is a valid t-sql, except that it actually will break the distinct selection of cUserID.

~~T-SQL_4 and T-SQL_5 return same result.~~

T-SQL_4 and T-SQL_5 may return same result or may not depending on the data of cDate,cDeviceID,cUsername.

This returns error message: Column 'TBL_Raw_Trans_201302.cUserID' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.

It can only work if you change it to become T-SQL_5.

~~In other words, 'Group By' actually means 'Distinct'.~~

'Distinct' may achieve the same result as 'Group By' depending on the data.

'Distinct' will result in nested results for non-distinct selection (column).

In other words, 'Distinct' only performs 'Group By' for the column which is selected for 'Distinct'.

T-SQL_7_1 - Full table selection:

T-SQL_7_2:

In this scenario, T-SQL_7_1 returns the same result as T-SQL_7_2

T-SQL_7_3:

T-SQL_7_4:

T-SQL_7_3 is the same as T-SQL_7_4 - these two queries will always yield the same result irregardless of the data.

T-SQL_7_5:

T-SQL_7_6:

T-SQL_7_5 is the same as T-SQL_7_6 - these two queries will always yield the same result irregardless of the data.

In retrospect, the official usage of 'Distinct' defined by Microsoft is that it is meant to return a one-column table

that contains the distinct values from the specified column.

http://technet.microsoft.com/en-us/library/ee634943.aspx

Hence, how is distinct really useful in a multiple column selection scenario ?

It can be used to identify differences in columns' data.

Using the following query, we can identify that cUserID has 3 different associated data with respect to cDate.

and with the following query, cUserID has 2 different associated data with respect to cDeviceID.

In other words, if you really want to return a table with multi column selection, use 'Group By' instead of 'Distinct'.

'Distinct' may achieve the same result as 'Group By' depending on the data.

**1.) T-SQL_1:**SELECT cUserID,max(cDate),max(cDeviceID),max(cUsername) FROM Trans_Table Group by cUserID Order by cUserID

**2.) T-SQL_2:**SELECT distinct(cUserID),max(cDate),max(cDeviceID),max(cUsername) FROM Trans_Table Group by cUserID Order by cUserID

T-SQL_1 and T-SQL_2 return the same result

**3.) T-SQL_3:**SELECT distinct(cUserID),max(cDate),max(cDeviceID),max(cUsername) FROM Trans_Table Group by cUserID,cDate Order by cUserID

This is a valid t-sql, except that it actually will break the distinct selection of cUserID.

**4.) T-SQL_4:**SELECT distinct(cUserID),cDate,cDeviceID,cUsername FROM Trans_Table Order by cUserID

**5.) T-SQL_5:**SELECT distinct(cUserID),min(cDate),min(cDeviceID),min(cUsername) FROM Trans_Table Group by cUserID Order by cUserID

T-SQL_4 and T-SQL_5 may return same result or may not depending on the data of cDate,cDeviceID,cUsername.

**6.) T-SQL_6:**SELECT distinct(cUserID),min(cDate),min(cDeviceID),min(cUsername) FROM Trans_Table Order by cUserID

This returns error message: Column 'TBL_Raw_Trans_201302.cUserID' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.

It can only work if you change it to become T-SQL_5.

'Distinct' may achieve the same result as 'Group By' depending on the data.

'Distinct' will result in nested results for non-distinct selection (column).

In other words, 'Distinct' only performs 'Group By' for the column which is selected for 'Distinct'.

**Case Study:**T-SQL_7_1 - Full table selection:

SELECT cUserID,cDate,cDeviceID,cUsername FROM Trans_Table order by cUserID

T-SQL_7_2:

SELECT distinct(cUserID),cDate,cDeviceID,cUsername FROM Trans_Table order by cUserID

In this scenario, T-SQL_7_1 returns the same result as T-SQL_7_2

T-SQL_7_3:

SELECT distinct(cUserID),min(cDate),min(cDeviceID),min(cUsername) FROM Trans_Table Group by cUserID Order by cUserID

T-SQL_7_4:

SELECT cUserID,min(cDate),min(cDeviceID),min(cUsername) FROM Trans_Table Group by cUserID Order by cUserID

T-SQL_7_3 is the same as T-SQL_7_4 - these two queries will always yield the same result irregardless of the data.

T-SQL_7_5:

SELECT distinct(cUserID),max(cDate),max(cDeviceID),max(cUsername) FROM Trans_Table Group by cUserID Order by cUserID

T-SQL_7_6:

SELECT cUserID,max(cDate),max(cDeviceID),max(cUsername) FROM Trans_Table Group by cUserID Order by cUserID

T-SQL_7_5 is the same as T-SQL_7_6 - these two queries will always yield the same result irregardless of the data.

In retrospect, the official usage of 'Distinct' defined by Microsoft is that it is meant to return a one-column table

that contains the distinct values from the specified column.

http://technet.microsoft.com/en-us/library/ee634943.aspx

Hence, how is distinct really useful in a multiple column selection scenario ?

It can be used to identify differences in columns' data.

Using the following query, we can identify that cUserID has 3 different associated data with respect to cDate.

SELECT distinct(cUserID),cDate,cDeviceID,cUsername FROM Trans_Table order by cUserID

and with the following query, cUserID has 2 different associated data with respect to cDeviceID.

SELECT distinct(cUserID),cDeviceID,cUsername FROM Trans_Table order by cUserID

In other words, if you really want to return a table with multi column selection, use 'Group By' instead of 'Distinct'.

## Comments