Lab 8 - Threading Activity with uthash


Expand for the usual lab directions
  • Submit your completed lab online to CourSys (not Canvas, not GitHub).
    • Labs are marked on completion, not correctness, so complete each part to the best of your ability and learn.
    • Lab is due Sunday and may be submitted up to 3 days late. Further extensions possible only in exceptional cases.
  • It is recommended that students attend in-person lab sections for lots of support!
    • You are invited to come to any (or more than one) lab section.
    • There is no need to attend the section you are enrolled in.
    • There is no need to attend any lab sections: there is no attendance taken.
    • You can complete the lab on your own time or own computer if you like.
    • Time suggestions are given here to guide students who are working on the lab during lab times. However, the entire lab must be completed by everyone for marks.
  • While completing these labs, you are encouraged to help your classmates and receive as much help as you like. Assignments, however, are individual work and you must not work with another person on assignments.
    • I recommend coming to the lab and working with someone.
    • Failing that, I recommend getting together with someone in the class outside of lab time to work through the lab together.
    • Failing that, you are welcome to complete the lab exercise without any collaboration.
    • Each student must submit their lab to get marks.
    • Academic honesty expectations for labs are that each student types their own lab solution. It is OK to share ideas and help each others with specific coding issues and design. It is not permitted to submit another student's file for credit.
  • If using CSIL lab PC:
    • [Only CSIL PC] If in Windows, reboot to Linux; while booting, select Ubuntu from boot menu.
    • [Only CSIL PC] Don't setup GitHub tokens in the cmpt201 container because it may be shared with other users.
    • [Only CSIL PC] Delete any possible previous docker container before starting the lab:
      docker rm cmpt201
    • [Only CSIL PC] When done your lab, copy your solution out of the container, then execute the above docker command to delete your docker container to leave it clean for the next user.
  • You do not need to use record.


Goals for this lab


  • Gaining a hands-on experience with uthash, a header-only library providing hash tables.
  • Practicing multi-threading in C including starting the threads with arguments, and waiting for their finish.
  • Practicing thread synchronization in a parallel processing problem.
  • Using the thread sanitizer and detecting race conditions.


Prerequisite Skills for This Lab


  • Read about the uthash library.
  • C Skills
    • Able to use structs, arrays, pointers
    • Able to compile, run, and debug C programs using CMake.


Task 0: Download & Run


  1. Clone the Lab 8 starter project
  2. Read lab8.c::main() to see what the program does (you don't have to implement any code yet).
  3. Compile and run the program.


Task 1: Sort Table


  1. Sort the table before printing.
    • See the uthash docs for information on sorting.
    • Find the code in main() which calls the function to print the counts to the screen.
    • Before calling the function to print, call the uthash function to sort the data.
    • Hint: The sort_func() function is already written for you! You just need to pass it and the word_map to the correct uthash function.
  2. Output should be:
    $ ./main
    Word                            Count
    brown                           2
    dog                             1
    fox                             2
    jumps                           1
    lazy                            1
    over                            1
    quick                           1
    the                             4


Task 2: Implement count_words_parallel(...)


Next, we'll get the program to count the words in parallel on multiple threads.

  1. Call count_words_parallel(...)
    • In main(), it currently calls count_words_seq(). Comment out this line, and uncomment the line calling count_words_parallel().
  2. Complete the partially implemented function count_words_parallel():
    • Initialize the count_mutex; it will be shared between all threads to synchronize access to the shared hash-table.
    • Inside the loop:
      • Initialize the threads_args[] value for the current thread. Call the pack_args() function to create the necessary data structure and fill in the values.
      • Launch the thread by calling pthread_create(..). Pass in:
        • the thread function counter_thread_func
        • pointer from the current element in threads_args[] (passed to the thread function by pthread_create()).
      • Store the thread ID into the correct location in threads[].
    • Wait for each of the threads to finish.
      • Hint: pthread_join().
    • Cleanup all dynamically allocated memory.
      • Hint: Look at threads_args[].
  3. Test your program
    • Note: It is not yet thread safe; we'll look into that next. Does it still work?


Task 3: Thread Sanitizer


  1. Enable the thread sanitizer and check the thread-safety of the implementation.
    • You'll need to change CMakeLists.txt to enable it.
    • Hint: add the following after the project definition in your CMakeLists.txt.
      add_compile_options(-fsanitize=thread -g -O1)
      add_link_options(-fsanitize=thread -g -O1)
  2. Run the program.
    • You should see output similar in format to:
      ==================
      WARNING: ThreadSanitizer: data race (pid=10434)
      Read of size 8 at 0x7ffd2093e8b8 by thread T2:
      ...
      Previous write of size 8 at 0x7ffd2093e8b8 by thread T1:
      ..
      Location is stack of main thread.
      Location is global '<null>' at 0x000000000000 ([stack]+0x00000001e8b8)
      ..


Task 4: Thread-Safe


  1. Make add_word_counts_in_chunk(...) thread-safe.
    • The function already takes in a mutex: just lock and unlock it as needed.
    • Hint: Don't lock/unlock for the whole function, just smaller part.
    • Hint: Lock at start of each pass through the loop; unlock at end of each pass through the loop.

4. [Optional] Challenges


  1. Optional: Lots of Data
  2. Optional: Thread Analysis
    • Manually time and compare how long it takes to process the data with different numbers of thread: one, two, and three threads.
    • How much time does the added complexity of running the operations in parallel save us?
  3. Optional: Generate data
    • Write some code to generate a table of data showing how long it takes to process Hamlet for each number of threads between 1 and about 10. This could be a script or C code.
    • Print the result to the screen and copy it into a program like LibreOffice Calc to graph.
    • Try running each configuration multiple times and comparing the differences.


Submission


Submit just your lab8.c C code to CourSys by Sunday midnight. The file name must be an exact match to what CourSys is expecting, otherwise it won't accept it. Remember you can submit up to 3 days late with no penalty if needed (but please plan to get it done on time to stay up to date on the course!)

Submissions will be marked for completion. You do not need to complete any optional steps.