[SQL] 15 days of learning

[SQL] 15 days of learning

Problem Statement

Julia conducted a 15 days of learning SQL contest. The start date of the contest was March 01, 2016 and the end date was March 15, 2016.

Write a query to print total number of unique hackers who made at least 1 submission each day (starting on the first day of the contest), and find the hacker_id and name of the hacker who made maximum number of submissions each day. If more than one such hacker has a maximum number of submissions, print the lowest hacker_id. The query should print this information for each day of the contest, sorted by the date.

Sample Output

2016-03-01 4 20703 Angela
2016-03-02 2 79722 Michael
2016-03-03 2 20703 Angela
2016-03-04 2 20703 Angela
2016-03-05 1 36396 Frank
2016-03-06 1 20703 Angela

Find the rest of the information through here: LINK

[Solving problem]: Understanding the Data:

Our data lives in 2 tables:

Hackers (hacker_id, name)
Submissions (submission_date, submission_id, hacker_id, score)

Breaking down the solution

The first step is to identify participants who submitted their work every day starting from the contest’s first day. To do this, we use window functions to rank submission dates globally and per hacker. The query snippet below shows how this is achieved:

SELECT submission_date, hacker_id,
DENSE_RANK() OVER(ORDER BY submission_date) as subDateRank,
DENSE_RANK() OVER(PARTITION BY hacker_id ORDER BY submission_date) as hacIdRank
FROM submissions

DENSE_RANK is a function which assigns a rank based on the order of dates.

  • subDateRank counts the days of the contest.
  • hacIdRank counts the submission days for each hacker

For example, consider the participation of two contestants, Angela (20703) and Michael (79722):

Angela's participation pattern:

| submission_date | submission_id | subDateRank | hacIdRank |
|---|---|---|---|
| 2016-03-01 | 20703 | 1 | 1 |
| 2016-03-02 | 20703 | 2 | 2 |
| 2016-03-03 | 20703 | 3 | 3 |
| 2016-03-04 | 20703 | 4 | 4 |
| 2016-03-05 | 20703 | 5 | 5 |
| 2016-03-06 | 20703 | 6 | 6 |

Michael's participation pattern:

| submission_date | submission_id | subDateRank | hacIdRank |
|---|---|---|---|
| 2016-03-01 | 79722 | 1 | 1 |
| 2016-03-02 | 79722 | 2 | 2 |
| 2016-03-03 | 79722 | 3 | 3 |
| 2016-03-04 | 79722 | 4 | 4 |
| 2016-03-05 | 79722 | 5 | null |
| 2016-03-06 | 79722 | 6 | null |

By comparing subDateRank with hacIdRank, we can confirm whether a hacker's submitted their work every starting from day 1.

The second part would be to calculate the total number of unique hackers who maintained a consecutive submission. We therefore use COUNT and DISTINCT for this purpose.

SELECT 
submission_date, 
COUNT(DISTINCT hacker_id) as TotalHackers
FROM    etc    -------
WHERE subDateRank = hacIdRank
GROUP BY submission_date

Next, we need to determine the hacker with the highest number of submissions each day. We can achieve this by grouping submissions by date and then ranking the hackers using the RANK window function.

RANK() 
OVER(PARTITION BY sub.submission_date ORDER BY COUNT(*) DESC, hac.hacker_id ASC) 
    AS hackerRank

Finally, we join the results from MaxSubmissions with UniqueSubmissions. This gives us, for each day, the submission date, the total number of unique hackers with consecutive participation, the hacker id and hacker name. Then, we filter to only include the highest ranked participator.

SELECT maxSub.submission_date, uniqueSub.TotalHackers, maxSub.hackerId, maxSub.hackerName
FROM MaxSubmissions AS maxSub
JOIN UniqueSubmissions AS uniqueSub
ON maxSub.submission_date = uniqueSub.submission_date
WHERE hackerRank = 1
ORDER BY maxSub.submission_date

Full code:

/*
Enter your query here.
1st march 2016 to 15th march 2016

total no of unique hackers who made atleast 1 submission each day 
*/

WITH UniqueSubmissions AS (
    SELECT 
        submission_date, 
        COUNT(DISTINCT hacker_id) as TotalHackers
    FROM (
        SELECT submission_date, hacker_id,
        DENSE_RANK() OVER(ORDER BY submission_date) as subDateRank,
        DENSE_RANK() OVER(PARTITION BY hacker_id ORDER BY submission_date) as hacIdRank
        FROM submissions
    ) temp
    WHERE subDateRank = hacIdRank
    GROUP BY submission_date
),
/*
 find the hacker_id and name of the hacker 
 who made maximum number of submissions each day
*/
MaxSubmissions AS
(
    SELECT sub.submission_date as submission_date, hac.hacker_id as hackerId, hac.name as hackerName, COUNT(*) AS submission_count,
    RANK() 
    OVER(PARTITION BY sub.submission_date ORDER BY COUNT(*) DESC, hac.hacker_id ASC) 
    AS hackerRank
    FROM HACKERS AS hac
    JOIN Submissions AS sub
    ON hac.hacker_id = sub.hacker_id
    GROUP BY sub.submission_date, hac.hacker_id, hac.name
)
SELECT maxSub.submission_date, uniqueSub.TotalHackers, maxSub.hackerId, maxSub.hackerName
FROM MaxSubmissions AS maxSub
JOIN UniqueSubmissions AS uniqueSub
ON maxSub.submission_date = uniqueSub.submission_date
WHERE hackerRank = 1
ORDER BY maxSub.submission_date
;

This solution has a lot of SQL concepts, including; Joins, Window Functions, Subqueries and CTEs.

So I hope that you have enjoyed learning something new today. If you would like a deeper dive into any of these topics, whether through a blog post or a YouTube video, feel free to leave a comment. Thank you ;)


About Me

I am Zaahra, a Google Women Techmakers Ambassador who enjoy mentoring people and writing about technical contents that might help people in their developer journey. I also enjoy building stuffs to solve real life problems.

To reach me:

LinkedIn: https://www.linkedin.com/in/faatimah-iz-zaahra-m-0670881a1/

X (previously Twitter): _fz3hra

GitHub: https://github.com/fz3hra

Cheers,

Umme Faatimah-Iz-Zaahra Mujore | Google Women TechMakers Ambassador | Software Engineer