Lab 8 - Threading Activity with uthash
Expand for the usual lab directions
- Submit your completed lab online to CourSys (not Canvas, not GitHub).
- Labs are marked on completion, not correctness, so complete each part to the best of your ability and learn.
- Lab is due Sunday and may be submitted up to 3 days late. Further extensions possible only in exceptional cases.
- It is recommended that students attend in-person lab sections for lots of support!
- You are invited to come to any (or more than one) lab section.
- There is no need to attend the section you are enrolled in.
- There is no need to attend any lab sections: there is no attendance taken.
- You can complete the lab on your own time or own computer if you like.
- Time suggestions are given here to guide students who are working on the lab during lab times. However, the entire lab must be completed by everyone for marks.
- While completing these labs, you are encouraged to help your classmates and receive as much help as you like. Assignments, however, are individual work and you must not work with another person on assignments.
- I recommend coming to the lab and working with someone.
- Failing that, I recommend getting together with someone in the class outside of lab time to work through the lab together.
- Failing that, you are welcome to complete the lab exercise without any collaboration.
- Each student must submit their lab to get marks.
- Academic honesty expectations for labs are that each student types their own lab solution. It is OK to share ideas and help each others with specific coding issues and design. It is not permitted to submit another student's file for credit.
- If using CSIL lab PC:
- [Only CSIL PC] If in Windows, reboot to Linux; while booting, select Ubuntu from boot menu.
- [Only CSIL PC] Don't setup GitHub tokens in the
cmpt201
container because it may be shared with other users. - [Only CSIL PC] Delete any possible previous docker container before starting the lab:
docker rm cmpt201
- [Only CSIL PC] When done your lab, copy your solution out of the container, then execute the above docker command to delete your docker container to leave it clean for the next user.
- You do not need to use
.record
Goals for this lab
- Gaining a hands-on experience with
uthash
, a header-only library providing hash tables. - Practicing multi-threading in C including starting the threads with arguments, and waiting for their finish.
- Practicing thread synchronization in a parallel processing problem.
- Using the thread sanitizer and detecting race conditions.
Prerequisite Skills for This Lab
- Read about the
uthash
library. - C Skills
- Able to use structs, arrays, pointers
- Able to compile, run, and debug C programs using CMake.
Task 0: Download & Run
- Clone the Lab 8 starter project
- Read
lab8.c::main()
to see what the program does (you don't have to implement any code yet). - Compile and run the program.
Task 1: Sort Table
- Sort the table before printing.
- See the uthash docs for information on sorting.
- Find the code in
main()
which calls the function to print the counts to the screen. - Before calling the function to print, call the uthash function to sort the data.
- Hint: The
sort_func()
function is already written for you! You just need to pass it and theword_map
to the correctuthash
function.
- Output should be:
$ ./main Word Count brown 2 dog 1 fox 2 jumps 1 lazy 1 over 1 quick 1 the 4
Task 2: Implement count_words_parallel(...)
Next, we'll get the program to count the words in parallel on multiple threads.
- Call
count_words_parallel(...)
- In
main()
, it currently callscount_words_seq()
. Comment out this line, and uncomment the line callingcount_words_parallel()
.
- In
- Complete the partially implemented function
count_words_parallel()
:- Initialize the
count_mutex
; it will be shared between all threads to synchronize access to the shared hash-table. - Inside the loop:
- Initialize the
threads_args[]
value for the current thread. Call thepack_args()
function to create the necessary data structure and fill in the values. - Launch the thread by calling
pthread_create(..)
. Pass in:- the thread function
counter_thread_func
- pointer from the current element in
threads_args[]
(passed to the thread function bypthread_create()
).
- the thread function
- Store the thread ID into the correct location in
threads[]
.
- Initialize the
- Wait for each of the threads to finish.
- Hint:
pthread_join()
.
- Hint:
- Cleanup all dynamically allocated memory.
- Hint: Look at
threads_args[]
.
- Hint: Look at
- Initialize the
- Test your program
- Note: It is not yet thread safe; we'll look into that next. Does it still work?
Task 3: Thread Sanitizer
- Enable the thread sanitizer and check the thread-safety of the implementation.
- You'll need to change
CMakeLists.txt
to enable it. - Hint: add the following after the
project
definition in yourCMakeLists.txt
.add_compile_options(-fsanitize=thread -g -O1) add_link_options(-fsanitize=thread -g -O1)
- You'll need to change
- Run the program.
- You should see output similar in format to:
================== WARNING: ThreadSanitizer: data race (pid=10434) Read of size 8 at 0x7ffd2093e8b8 by thread T2: ... Previous write of size 8 at 0x7ffd2093e8b8 by thread T1: .. Location is stack of main thread. Location is global '<null>' at 0x000000000000 ([stack]+0x00000001e8b8) ..
- You should see output similar in format to:
Task 4: Thread-Safe
- Make
add_word_counts_in_chunk(...)
thread-safe.- The function already takes in a mutex: just lock and unlock it as needed.
- Hint: Don't lock/unlock for the whole function, just smaller part.
- Hint: Lock at start of each pass through the loop; unlock at end of each pass through the loop.
4. [Optional] Challenges
- Optional: Lots of Data
- Change the program to process a lot of data, such as full text of hamlet!
- Optional: Thread Analysis
- Manually time and compare how long it takes to process the data with different numbers of thread: one, two, and three threads.
- How much time does the added complexity of running the operations in parallel save us?
- Optional: Generate data
- Write some code to generate a table of data showing how long it takes to process Hamlet for each number of threads between 1 and about 10. This could be a script or C code.
- Print the result to the screen and copy it into a program like LibreOffice Calc to graph.
- Try running each configuration multiple times and comparing the differences.
Submission
Submit just your lab8.c
C code to CourSys by Sunday midnight. The file name must be an exact match to what CourSys is expecting, otherwise it won't accept it. Remember you can submit up to 3 days late with no penalty if needed (but please plan to get it done on time to stay up to date on the course!)
Submissions will be marked for completion. You do not need to complete any optional steps.