Assignment 3: Billboard Top 100
Corrections
- Oct 18: Corrected output of number of songs advancing to be:
Songs advancing in rank wrt previous week: 181
Intro
We all love our favourite music! Inspired by the CMPT 120 course, you and your friends want to get ideas of possible concerts to go to by exploring data, and here is such opportunity: You will be able to analyze a data file containing Billboard weekly ranked top 100 songs. We will only use a partial set of data (May 2024), but you can play with the full data set (1958-2024). (File initially downloaded from Kaggle.
Data File
Start by creating a new folder for this assignment and then downloading the CSV data file hot-100-May2024.csv
above into this folder. It containing the song charts information for May 2024. For clarity, this document will refer to the data file's column names in italics.
- The first row has the title for each column:
- char_week is the week for this data point; the file includes multiple weeks of data.
- _current_week is the song's spot on the chart this week.
- last_week is the previous week ranking.
- peak_pos indicates the highest rank the song reached so far (not including the future).
- wks_on_chart indicates the number of weeks that the song has been rated in the top 100 so far (not including the future).
- Each line in the file after the first line (i.e. after the header) contains data about a particular song/artist.
- For example, the song "Fortnight", by Taylor Swift, in the first data line, shows it is the song's first week on the chart. By May 25th (4th week on chart) the song had fallen to 8th.
- Many songs appear several times in the file (even in the partial file provided here), as there are rankings for each week and many songs appear in many weeks. Further, the same artist may appear multiple times, even during the same week, since the same or different songs by the same artist may be in these top charts.
- For example, Taylor Swift's "The Tortured Poets Department" album released April 19th and earned her numerous spots on the chart!
Have a careful look through the data file and make sure you understand how it is structured before starting the exercise. Also carefully check the sample outputs which complement the exercise description.
This exercise is broken down into two parts, and part 2 has two queries, all of which should be included, in order, in a single Python.
Part 1: Overall Data
Your program should output the following information (in this order):
- A greeting message to the user.
- Aggregate information over all the songs in the provided file (i.e. with the May 2024 information):
- the number of songs including the word "love" in the title (case insensitive).
- the songs names which were in rank 1 or 2.
- the artist's names whose names start in 'A' (case insensitive).
- the number of songs which advanced in the ranks with respect to the previous week (their current rank number is smaller than their last-week rank number). Rank number #1 is the highest possible rank. A song that appears on the charts for the first time has increased in rank that week.
- the average of weeks-on-board, to 2 decimal places (of all songs in the provided file).
You should NOT check if the songs or the artists are repeated. That is, if a song or artist is repeated in the file, consider these as separate entries.
Hints for Part 1
- You should be able to solve this part with one single loop (considering the loop over the file) . Plan ahead which variables and lists you may need. You may have other loops after you visited all lines of the file (for example, to print a short list of names).
Sample Output
Welcome to the Billboard top 100 app!
*************************
PART 1: Statistics & Data
*************************
Number songs containing the word 'love': 9
Song names in rank positions 1 or 2: 8
- Fortnight
- Down Bad
- Fortnight
- Million Dollar Baby
- Not Like Us
- Million Dollar Baby
- I Had Some Help
- Not Like Us
Artist names starting with 'A': 8
- Ariana Grande
- Artemas
- Ariana Grande
- Artemas
- Ariana Grande
- Artemas
- Ariana Grande
- Artemas
Songs advancing in rank wrt previous week: 181
Average weeks on board all songs: 9.21
Part 2: User Interaction
In your same file, underneath your code for Part 1, print a title and then write code to do the following two queries. Each query should be done only once, and in this order.
- Print a title, or some other output to clearly separate Part 2 output from Part 1 output. Do NOT do this by clearing the screen or printing lots of empty lines, but just print some couple lines and an appropriate title. For each query, also print appropriate subtitles (see the sample runs).
- First query: Ask the user to enter an artist name (or text which could be part of the name). Then, for each song by an artist containing that input, print the following in a table:
- the complete artist name, the song title, the date (with the same format as in the file) , the current rank and last-week rank.
- If an artist sings jointly with another artist that is ok, and you should include the song in the printout
- If the part typed by the user applies to more than one artist, include all the songs for those artists
- Once again, treat repeated songs as separate entries
- If what the user typed is not part of any artist name, inform the user that there are no songs associated to the artist.
- Second query: Ask the user to enter one song title, or some text which could be part of the song title. Once a song with such title (or including such text) is found, then print to the user:
- The complete song title, the date and its weeks-on-board value. Print it only once for the first song encountered as you read the file - see Hints below.
- Then, after printing this information for the requested song, print all the songs (from the beginning of the file) which were on the board more weeks than the requested song. For these songs, include the song tile, the date and the difference in the number of weeks on the board between each song and the requested song.
Hints for part 2: extra information to solve this part, and simplifying assumptions:
- For some queries you may have to process the file more than once (or store all the data in a list). To do so, when your program finishes reading the file, you can call the method seek(0) to start reading from the beginning of the file again (e.g., suppose
data_file
is the variable referring to the data file,data_file.seek(0)
will allow you to read the file from the first character, and therefore the first line, again). - Print data in columns using f-strings. Here are a couple examples.
- More details about the format method are available in the Runestone textbook. You may also explore this tutorial
- Suggest using a width of 25-35 characters for song titles and for artists names. This will cause a non-aligned look when the names or titles are wider, but that is ok.
- Your program must accept any case from the user input. For example, "SWIFT", and "swift"" would be recognized as the same artist. Leading and trailing spaces and punctuation marks including the characters
.,?!
in the input should also be ignored. - Hint for the first query in Part 2: To determine if an artist is not in the file, notice that you need to visit the whole file. One possibility is to count the number of times the artist name is found as the file is read. Based on this you can print a message after reading the file if needed.
- Hints for the second query in Part 2:
- To print only the first occurrence of the requested song, you may use the break statement inside the for loop as soon as you printed such information. There are other ways to obtain this effect to be seen later in the course.
- You must handle if there are no songs which match the user's input.
- It is possible that there are no songs with more weeks on board than the requested song.
Sample Output
Here is sample output for Part 2. User input shown on the lines after ": ".
Sample Run 1 - Part 2: "ley" and "greed"
************************
PART 2: User Interaction
************************
First query: Artist name (may be part of the name)
: ley
ARTIST SONG DATE RANK PREVIOUS RANK
Bailey Zimmerman Where It Ends 2024-05-04 65 38
Beyonce & Miley Cyrus II Most Wanted 2024-05-04 84 47
YG Marley Praise Jah In The Moonlight 2024-05-04 94 76
Bailey Zimmerman Where It Ends 2024-05-11 50 65
Beyonce & Miley Cyrus II Most Wanted 2024-05-11 89 84
Bailey Zimmerman Where It Ends 2024-05-18 45 50
Beyonce & Miley Cyrus II Most Wanted 2024-05-18 93 89
Bailey Zimmerman Where It Ends 2024-05-25 37 45
Second query: Song title (may be part of the name)
: greed
REQUESTED SONG DATE WEEKS ON BOARD
Greedy 2024-05-04 32
Songs with more weeks on board than the requested song:
SONG DATE EXTRA WEEKS ON BOARD
Lose Control 2024-05-04 5
I Remember Everything 2024-05-04 3
Cruel Summer 2024-05-04 19
Lose Control 2024-05-11 6
I Remember Everything 2024-05-11 4
Greedy 2024-05-11 1
Cruel Summer 2024-05-11 20
Lose Control 2024-05-18 7
I Remember Everything 2024-05-18 5
Cruel Summer 2024-05-18 21
Greedy 2024-05-18 2
Agora Hills 2024-05-18 1
Lose Control 2024-05-25 8
Stick Season 2024-05-25 1
I Remember Everything 2024-05-25 6
Cruel Summer 2024-05-25 22
Greedy 2024-05-25 3
Agora Hills 2024-05-25 2
Sample Run 2 - Part 2: "tin" & "control"
************************
PART 2: User Interaction
************************
First query: Artist name (may be part of the name)
: tin
ARTIST SONG DATE RANK PREVIOUS RANK
Justin Timberlake Selfish 2024-05-04 100 60
Bryan Martin We Ride 2024-05-11 93 0
Bryan Martin We Ride 2024-05-18 82 93
Bryan Martin We Ride 2024-05-25 88 82
Second query: Song title (may be part of the name)
: control
REQUESTED SONG DATE WEEKS ON BOARD
Lose Control 2024-05-04 37
Songs with more weeks on board than the requested song:
SONG DATE EXTRA WEEKS ON BOARD
Cruel Summer 2024-05-04 14
Lose Control 2024-05-11 1
Cruel Summer 2024-05-11 15
Lose Control 2024-05-18 2
Cruel Summer 2024-05-18 16
Lose Control 2024-05-25 3
I Remember Everything 2024-05-25 1
Cruel Summer 2024-05-25 17
Sample Run 3 - Part 2: Longest song on chart
************************
PART 2: User Interaction
************************
First query: Artist name (may be part of the name)
: Djo
ARTIST SONG DATE RANK PREVIOUS RANK
Djo End Of Beginning 2024-05-04 56 24
Djo End Of Beginning 2024-05-11 39 56
Djo End Of Beginning 2024-05-18 39 39
Djo End Of Beginning 2024-05-25 40 39
Second query: Song title (may be part of the name)
: Cruel
REQUESTED SONG DATE WEEKS ON BOARD
Cruel Summer 2024-05-04 51
Songs with more weeks on board than the requested song:
SONG DATE EXTRA WEEKS ON BOARD
Cruel Summer 2024-05-11 1
Cruel Summer 2024-05-18 2
Cruel Summer 2024-05-25 3
Sample Run 4 - Part 2: No matching artist or songs
************************
PART 2: User Interaction
************************
First query: Artist name (may be part of the name)
: nobody
There is no such artist in the file.
Second query: Song title (may be part of the name)
: nothing here
No songs matched that name
General Notes, Requirements
- The information and titles that you show to the user should be similar to what is shown in the sample runs (they don't have to be exactly the same).
- You may load the whole file into memory (list/dictionary/...) at once.
- We may use a different data file (but with the same file name and structure) to mark your assignment's correctness. So do NOT "hard code" anything based on the data. It is fine that you hardcode the table output format (printed to the screen), such as table column titles, widths of columns, etc. Any test file will have identical columns, in the same order as the sample data files.
- You must write Python code that calculates and prints out the answers to the questions, by processing the CSV file in your program. You get no points just for providing the exact answers by calculating the answers in Excel and printing them.
- You are required to NOT import any module other than
pathlib
(e.g., ones that provide CSV file manipulation methods) because one objective of this coding exercise is for you to practice with basic text file manipulation. We will work with modules later in the course.
Readability Guidelines
- Ensure that there is a comment at the top of the file with
- Description of the file
- Your name & date
- All import statements should be placed at the top of the file, immediately after the header
- Modules should only be included once per file
- Well-named variables (describes their purpose)
- Each line of code should be less than 100 characters
- Group your code into smaller sections or paragraphs, and give each a descriptive comment. Have a blank line between each "paragraph" of code that does one thing. Do not have too many extra blank lines in the code beyond what is reasonably needed for code readability.
Submission
Submit your python file directly to CourSys (don't ZIP it). Do not submit any of the .CSV files.
Submission Notes
- All submissions will be compared for unexplainable similarities. We expect submissions will be somewhat similar for this assignment but do your own original work; we are checking!
- Do not email/give your code to another student. Do not accept code from another student. Do not post your code online.
Troubleshooting Submission Problem
Here are some things to look at if you have challenges submitting:
- When submitting your source code to CourSys, it will enforce that the files have the correct names. Ensure your file names are exactly as expected.
- Ensure your files have a .py file extension.
- After submitting, you can verify if any files were submitted by viewing the assignment page on CourSys. It should allow you to view/download your submission.
Marking Notes
- Code that does not run (i.e., crashes or produces an error stopping the program) then it will get a mark of 0.
- To ensure your code works try to finish early so you can get help from TAs if needed! It is better to submit code that works perfectly but doesn't contain all the features, than one that has all the features (potentially) but crashes.
- Each line of your code should NOT exceed 100 characters. This is to help you develop a habit of good code formatting.
- You could also use intermediate variables to store parts of long messages to be printed (more situations will be seen as the course advances).
- If you are using offline IDEs like VS Code or IDLE, you can look at the bottom of the editor which often tells you how many characters you have typed in a line (as COL's).
- If you are using Replit, first go to Settings on the left panel and set "Wrapping" to "none". Then use this sentence as a guide:
# This line has exactly 100 characters (including the period), use it to keep each line under limit.