Lab 8 - Threading Activity with uthash

Goals for this lab


  • Gaining a hands-on experience with uthash, a header-only library providing hash tables.
  • Practicing multi-threading in C including starting the threads with arguments, and waiting for their finish.
  • Practicing thread synchronization in a parallel processing problem.
  • Using the thread sanitizer and detecting race conditions.


Prerequisite Skills for This Lab


  • Read about the uthash library.
  • C Skills
    • Able to use structs, arrays, pointers
    • Able to compile, run, and debug C programs using CMake.


Task 0: Download & Run


  1. Clone the Lab 8 starter project
    • Once you have the files, you are welcome to copy them elsewhere, such as your course-work repo.
  2. Read lab8.c::main() to see what the program does (you don't have to implement any code yet).
  3. Compile and run the program.


Task 1: Sort Table


  1. Sort the table before printing.
    • See the uthash docs for information on sorting.
    • Find the code in main() which calls the function to print the counts to the screen.
    • Before calling the function to print, call the uthash function to sort the data.
    • Hint: The sort_func() function is already written for you! You just need to pass it and the word_map to the correct uthash function.
  2. Output should be:
    $ ./main
    Word                            Count
    brown                           2
    dog                             1
    fox                             2
    jumps                           1
    lazy                            1
    over                            1
    quick                           1
    the                             4


Task 2: Implement count_words_parallel(...)


Next, we'll get the program to count the words in parallel on multiple threads.

  1. Call count_words_parallel(...)
    • In main(), it currently calls count_words_seq(). Comment out this line, and uncomment the line calling count_words_parallel().
  2. Complete the partially implemented function count_words_parallel():
    • Initialize the count_mutex; it will be shared between all threads to synchronize access to the shared hash-table.
    • Inside the loop:
      • Initialize the threads_args[] value for the current thread. Call the pack_args() function to create the necessary data structure and fill in the values.
      • Launch the thread by calling pthread_create(..). Pass in:
        • the thread function counter_thread_func
        • pointer from the current element in threads_args[] (passed to the thread function by pthread_create()).
      • Store the thread ID into the correct location in threads[].
    • Wait for each of the threads to finish.
      • Hint: pthread_join().
    • Cleanup all dynamically allocated memory.
      • Hint: Look at threads_args[].
  3. Test your program
    • Note: It is not yet thread safe; we'll look into that next. Does it still work?


Task 3: Thread Sanitizer


  1. Enable the thread sanitizer and check the thread-safety of the implementation.
    • You'll need to change CMakeLists.txt to enable it.
    • Hint: add the following after the project definition in your CMakeLists.txt.
      add_compile_options(-fsanitize=thread -g -O1)
      add_link_options(-fsanitize=thread -g -O1)
  2. Run the program.
    • You should see output similar in format to:
      ==================
      WARNING: ThreadSanitizer: data race (pid=10434)
      Read of size 8 at 0x7ffd2093e8b8 by thread T2:
      ...
      Previous write of size 8 at 0x7ffd2093e8b8 by thread T1:
      ..
      Location is stack of main thread.
      Location is global '<null>' at 0x000000000000 ([stack]+0x00000001e8b8)
      ..


Task 4: Thread-Safe


  1. Make add_word_counts_in_chunk(...) thread-safe.
    • The function already takes in a mutex: just lock and unlock it as needed.
    • Hint: Don't lock/unlock for the whole function, just smaller part.
    • Hint: Lock at start of each pass through the loop; unlock at end of each pass through the loop.

4. Optional Challenges


  1. Optional: Lots of Data
  2. Optional: Thread Analysis
    • Manually time and compare how long it takes to process the data with different numbers of thread: one, two, and three threads.
    • How much time does the added complexity of running the operations in parallel save us?
  3. Optional: Generate data
    • Write some code to generate a table of data showing how long it takes to process Hamlet for each number of threads between 1 and about 10. This could be a script or C code.
    • Print the result to the screen and copy it into a program like LibreOffice Calc to graph.
    • Try running each configuration multiple times and comparing the differences.


Submission


Submit your lab8.c C code to CourSys; the file name must be an exact match to what CourSys is expecting, otherwise it won't accept it.

Submissions will be marked for completion. It must be valid C code that runs (however we are unlikely to actually compile and run the code). You do not need to complete any optional steps.