SQL question

nitro · Oct 9, 2010

Say I have a table that has three columns,

dateReg, email, age.

I want to find duplicates and display all the columns. If I do this it shows duplicate emails, just showing that column:
Code:
SELECT email, 
 COUNT(email) AS NumOccurrences
FROM users
GROUP BY email
HAVING ( COUNT(email) > 1 )
But if I say
Code:
Select email, dateReg,
...
and the rest of the code below this line is the same, it gives me an error

Msg 8120, Level 16, State 1, Line 1
Column dateReg' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.

How do I display duplicates, but with all columns of the duplicate entry showing?

Rodney King · Oct 9, 2010

I assume you've pinged http://stackoverflow.com/ - ?

nitro · Oct 9, 2010

Duh!
Code:
.. group by email, dateReg having(count(email) > 1)
Sorry I thought I tried this and failed.

Getting tired...

nitro · Oct 9, 2010

Quote from nitro:

Duh!

Code:

.. group by email, dateReg having(count(email) > 1)

Sorry I thought I tried this and failed.

Getting tired...
More...
Wait, that doesn't work! It only shows a fraction of the number of duplicates!

F****g SQL man.

nitro · Oct 9, 2010

I understand what the problem is. Since it can only return one row, but there are multiple occurrences, which of the rest of the row is it supposed to display since they are all different?

This is ridiculous. Maybe I will just read the table into C# and do it all there. The problem is that this is the sort of thing why I was using SQL!!! LMAO

nitro · Oct 9, 2010

Say you are storing news in a database, and each story is indexed by date:time. How in SQL do you return all the columns (one column being the actual text) pertaining to a given day regardless of time?

This has to be easy!

nitro · Oct 9, 2010

This template got me closer, but still no cigar

Code:

SELECT Delegate.DelegateID, Delegate.FirstName, Delegate.LastName 
       FROM Delegate INNER JOIN 
             (SELECT FirstName, LastName 
                    FROM Delegate 
                    GROUP BY FirstName, LastName 
                    HAVING Count(DelegateID) > 1 
             )     AS Duplicates 
             ON Delegate.FirstName = Duplicates.FirstName 
             AND Delegate.LastName = Duplicates.LastName

nitro · Oct 9, 2010

Quote from nitro:

This template got me closer, but still no cigar

Code:

SELECT Delegate.DelegateID, Delegate.FirstName, Delegate.LastName FROM Delegate INNER JOIN (SELECT FirstName, LastName FROM Delegate GROUP BY FirstName, LastName HAVING Count(DelegateID) > 1 ) AS Duplicates ON Delegate.FirstName = Duplicates.FirstName AND Delegate.LastName = Duplicates.LastName
More...
Got it. Remove the second field from the temp table Duplicates. Strange, I don't know why that works, but I don't care for now.

dhpar · Oct 9, 2010

this is ok - basically creating intermediate view on the fly.

note that your request does not make much sense from the dba standpoint...
why to show all duplicate rows in the first place? why not just one record for each duplication?...maybe with a counter column added?

and that's after ignoring issue of indexes and primary keys...

p.s.
also you should label all fields and tables with aliases in order not to mess up...

nitro · Oct 9, 2010

Quote from dhpar:

this is ok - basically creating intermediate view on the fly.

note that your request does not make much sense from the dba standpoint...
why to show all duplicate rows in the first place? why not just one record for each duplication?...maybe with a counter column added?

and that's after ignoring issue of indexes and primary keys...

p.s.
also you should label all fields and tables with aliases in order not to mess up...
More...

I am not sure I follow. Are you saying that it makes no sense to show duplicate rows? In the example I gave before, it makes perfect sense. Say you are storing news items in a database. Say your columns are

GUID(PK), DateTime, Headline, NewsStory, Author.

Say you want to return all the news stories and the author for a single day by a given author. Clearly this is a table that is a bunch of duplicate entries.

Say you want to see all the rows that have multiple authors for a single day (any day),where select shows only the date part (not time) and the author.