Lab 8 - Threading Activity with uthash
Goals for this lab
- Gaining a hands-on experience with
uthash
, a header-only library providing hash tables. - Practicing multi-threading in C including starting the threads with arguments, and waiting for their finish.
- Practicing thread synchronization in a parallel processing problem.
- Using the thread sanitizer and detecting race conditions.
Prerequisite Skills for This Lab
- Read about the
uthash
library. - C Skills
- Able to use structs, arrays, pointers
- Able to compile, run, and debug C programs using CMake.
Task 0: Download & Run
- Clone the Lab 8 starter project
- Read
lab8.c::main()
to see what the program does (you don't have to implement any code yet). - Compile and run the program.
Task 1: Sort Table
- Sort the table before printing.
- See the uthash docs for information on sorting.
- Find the code in
main()
which calls the function to print the counts to the screen. - Before calling the function to print, call the uthash function to sort the data.
- Hint: The
sort_func()
function is already written for you! You just need to pass it and theword_map
to the correctuthash
function.
- Output should be:
$ ./main Word Count brown 2 dog 1 fox 2 jumps 1 lazy 1 over 1 quick 1 the 4
Task 2: Implement count_words_parallel(...)
Next, we'll get the program to count the words in parallel on multiple threads.
- Call
count_words_parallel(...)
- In
main()
, it currently callscount_words_seq()
. Comment out this line, and uncomment the line callingcount_words_parallel()
.
- In
- Complete the partially implemented function
count_words_parallel()
:- Initialize the
count_mutex
; it will be shared between all threads to synchronize access to the shared hash-table. - Inside the loop:
- Initialize the
threads_args[]
value for the current thread. Call thepack_args()
function to create the necessary data structure and fill in the values. - Launch the thread by calling
pthread_create(..)
. Pass in:- the thread function
counter_thread_func
- pointer from the current element in
threads_args[]
(passed to the thread function bypthread_create()
).
- the thread function
- Store the thread ID into the correct location in
threads[]
.
- Initialize the
- Wait for each of the threads to finish.
- Hint:
pthread_join()
.
- Hint:
- Cleanup all dynamically allocated memory.
- Hint: Look at
threads_args[]
.
- Hint: Look at
- Initialize the
- Test your program
- Note: It is not yet thread safe; we'll look into that next. Does it still work?
Task 3: Thread Sanitizer
- Enable the thread sanitizer and check the thread-safety of the implementation.
- You'll need to change
CMakeLists.txt
to enable it. - Hint: add the following after the
project
definition in yourCMakeLists.txt
.add_compile_options(-fsanitize=thread -g -O1) add_link_options(-fsanitize=thread -g -O1)
- You'll need to change
- Run the program.
- You should see output similar in format to:
================== WARNING: ThreadSanitizer: data race (pid=10434) Read of size 8 at 0x7ffd2093e8b8 by thread T2: ... Previous write of size 8 at 0x7ffd2093e8b8 by thread T1: .. Location is stack of main thread. Location is global '<null>' at 0x000000000000 ([stack]+0x00000001e8b8) ..
- You should see output similar in format to:
Task 4: Thread-Safe
- Make
add_word_counts_in_chunk(...)
thread-safe.- The function already takes in a mutex: just lock and unlock it as needed.
- Hint: Don't lock/unlock for the whole function, just smaller part.
- Hint: Lock at start of each pass through the loop; unlock at end of each pass through the loop.
4. [Optional] Challenges
- Optional: Lots of Data
- Change the program to process a lot of data, such as full text of hamlet!
- Optional: Thread Analysis
- Manually time and compare how long it takes to process the data with different numbers of thread: one, two, and three threads.
- How much time does the added complexity of running the operations in parallel save us?
- Optional: Generate data
- Write some code to generate a table of data showing how long it takes to process Hamlet for each number of threads between 1 and about 10. This could be a script or C code.
- Print the result to the screen and copy it into a program like LibreOffice Calc to graph.
- Try running each configuration multiple times and comparing the differences.
Submission
Submit just your lab8.c
C code to CourSys by Sunday midnight. The file name must be an exact match to what CourSys is expecting, otherwise it won't accept it. Remember you can submit up to 3 days late with no penalty if needed (but please plan to get it done on time to stay up to date on the course!)
Submissions will be marked for completion. You do not need to complete any optional steps.