Lab 1: Reading a line and tokenizing it

Prerequisite Skills for This Lab


  • Able to run the Docker dev environment for this course.
  • C Skills
    • Able to create a C program.
    • Able to use man page.

1. Task


Your task is:

Write a program that receives a full line of user (keyboard) input, tokenizes it with the delimiter " " (space), and prints out each token one at a time on a new line. Use getline() for user input and strtok_r() for tokenization.

Example output:

Please enter some text: Tomorrow is Friday.
Tokens:
  Tomorrow
  is
  Friday.


2. Design


  1. Labs are co-operative! Introduce yourself to someone new!
    • You can work individually or in pairs (pair-programming).
    • Help each other!
    • For labs, you are welcome to share as much of you ideas and code as you like, and accept an unlimited amount of help from anyone.
  2. Open dev environment
    • See Resources page for docker commands.
    • I suggest opening 3 tabs / terminals: It's faster to switch tabs that close/open NeoVim.
      • Tab 1: Use for man pages.
      • Tab 2: Use for editing code in NeoVim.
      • Tab 3: Use it to compile and run program.
  3. Design your solution.
    • Don't Google or AI it; you need these skills. Labs are for figuring it out, not for completing work.
    • Use man pages for gitline() and strtok_r().
    • Discuss with someone how these functions works (see hints below).
    • Suggestion: use pseudo code to plan your code.

Design Hints

  • Don't write any C code yet, sketch out your ideas first.
  • getline() allows you to reade a full line of text from the keyboard:
    • View the man page with: man getline
    • No, really. Go read the man page now!
  • Tokenize using strtok_r():
    • Read its man page! It's a complex function.
    • Hint - The first call to strtok_r() behaves differently than subsequent calls:
      • 1st call: Pass in your full string, the delimiter (a space), and the address of a pointer to save its progress.
      • 2nd+ calls: Pass in NULL, the delimiter (a space), and the address of the pointer saving its progress.
    • The progress pointer saveptr is used by strtok_r() to keep track of how much of your string it has processed.
      • Think why is saveptr passed as a pointer to a pointer?
      • Think what you'll need to pass in, and what will it give you back.
      • Try drawing a picture.
      • Hint - Why a pointer-to-pointer? It needs to be able to change where this pointer points, so we have to pass it the pointer using *pass-by-pointer*. Therefore, pass the address of the pointer; hence it's type is char**.
    • Hint - Return value Read the man page about strtok_r()'s return value to get the actual tokens.
    • Hint - Loop Design a loop so that you can process all tokens. Read the return value of strtok_r() to decide how to stop the loop.


3. Implement it!


  • Write a little code at a time:
    • Write o couple lines, compile, run, debug. Repeat!
    • Compile with the address sanitizer so it displays errors on bad pointers when running:
      clang lab1.c -fsanitize=address
    • Run with ./a.out
    • Hint - Compile and Run Combine compiling and running into one command line command using && (AND):
      clang lab1.c -fsanitize=address && ./a.out
  • Do simplest thing first (like print "Hello world!"). Then add functionality bit-by-bit.

Implementation Hints

  • Hint - _POSIX_C_SOURCE Some functions (like getline()) need the _POSIX_C_SOURCE feature test macro. Define this as the top line in your C file, such as:
    
    #define _POSIX_C_SOURCE 200809L
    #include 
    #include <...>
    
  • Read in a string with getline(); print your buffer to the screen to ensure it is correct.
    • getline() can allocate a buffer for you, or resize your buffer if it's too small.
    • Since it can allocate memory, you pass it both the address of your buffer pointer, and the address of your variable storing the buffer size.
    • Since it dynamically allocates memory, you need to free() that memory.
    • Hint - Example code
      
      char *buff = NULL;
      char size = 0;
      ssize_t num_char = getline(&buff, &size, stdin);
      ...         // Use buff
      free(buff);
      
  • Hints for strtok_r():
    • Don't use plain strtok(): It is not thread-safe! This will cause problems in later programs.
    • Repeatedly call strtok_r() to get the next token.
      • On the first call, you'll need to pass in your full string:
        char *saveptr;
        char *ret = strtok_r(buff, " ", &saveptr);
      • On the second and subsequent calls, pass in NULL (saveptr stores it):
        char *ret2 = strtok_r(NULL, " ", &saveptr);
  • Add error handling
    • Many functions in the C Standard Library (stdlib) return a negative number if they failed.
    • Almost every function call should be checked for errors. If you get an error, call perror() and exit the program.
    • Hint - Example code
      
      char *buff = NULL;
      char size = 0;
      ssize_t num_char = getline(&buff, &size, stdin);
      if (num_char == -1) {
          perror("getline failed");
          exit(EXIT_FAILURE);
      }
      ...
      free(buff);
      
  • Comparing getline() return value vs n argument:
    • Recall getline()'s prototype:
      ssize_t getline(char **lineptr, size_t *n, FILE *stream);
    • The n argument is the size of the buffer. It is of type size_t because it must be 0 or more (unsigned).
    • getline() returns the number of characters read. It is of type ssize_t because it can be negative (for failure).
    • Pay attention to the difference between size_t and ssize_t (unsigned vs signed)!


4. Reviewing


  • During the last 15 minutes of each lab hour, TAs will show a sample solution.
  • TAs will talk through how the solution works and discuss its implementation.
  • Suggested topics to discuss:
    1. Where is memory allocated for getline()?
    2. How does strtok_r() use memory?
    3. Why are some parameters pointers, or pointers-to-pointers?
    4. How does the solution loop through all tokens?
    5. What are some good values to test the code with?

5. [Optional] Challenges


You can try these if you like; however, we are not marking them! They are just for your own learning.

  1. Optional: Test with multiple spaces between tokens. Test with tabs. Can you explain why your program does this?
  2. Optional: Make your program loop to repeatedly ask the user for a new string and then tokenize it. Exit when user just presses enter.
  3. Optional: Move your tokenization code into a function. Pass the function the source string and the number of characters and have it print the tokens to the screen.
  4. Optional: Try and refactor your code to clean it up. Try only having one call to strtok_r(). Put it in a loop and manage its pointers correctly.

Submission


Submit your C code to CourSys; the file name must be an exact match to what CourSys is expecting, otherwise it won't accept it.

Submissions will be marked for completion. It must be valid C code that runs (however we are unlikely to actually compile and run the code). You do not need to complete any optional steps.

If you were working on a CSIL machine, delete your docker container: docker rm cmpt201
(Don't do this if you were on your own laptop!)