Lab 1: Reading a line and tokenizing it
Prerequisite Skills for This Lab
- Able to run the Docker dev environment for this course.
- C Skills
- Able to create a C program.
- Able to use
man
page.
1. Task
Your task is:
Write a program that receives a full line of user (keyboard) input, tokenizes it with the delimiter " " (space), and prints out each token one at a time on a new line. Use
getline()
for user input andstrtok_r()
for tokenization.
Example output:
Please enter some text: Tomorrow is Friday.
Tokens:
Tomorrow
is
Friday.
2. Design
- Labs are co-operative! Introduce yourself to someone new!
- You can work individually or in pairs (pair-programming).
- Help each other!
- For labs, you are welcome to share as much of you ideas and code as you like, and accept an unlimited amount of help from anyone.
- Open dev environment
- See Resources page for docker commands.
- I suggest opening 3 tabs / terminals: It's faster to switch tabs that close/open NeoVim.
- Tab 1: Use for
man
pages. - Tab 2: Use for editing code in NeoVim.
- Tab 3: Use it to compile and run program.
- Tab 1: Use for
- Design your solution.
- Don't Google or AI it; you need these skills. Labs are for figuring it out, not for completing work.
- Use man pages for
gitline()
andstrtok_r()
. - Discuss with someone how these functions works (see hints below).
- Suggestion: use pseudo code to plan your code.
Design Hints
- Don't write any C code yet, sketch out your ideas first.
getline()
allows you to reade a full line of text from the keyboard:- View the
man
page with:man getline
- No, really. Go read the
man
page now!
- View the
- Tokenize using
strtok_r()
:- Read its
man
page! It's a complex function. -
Hint - The first call to
strtok_r()
behaves differently than subsequent calls:- 1st call: Pass in your full string, the delimiter (a space), and the address of a pointer to save its progress.
- 2nd+ calls: Pass in NULL, the delimiter (a space), and the address of the pointer saving its progress.
- The progress pointer
saveptr
is used bystrtok_r()
to keep track of how much of your string it has processed.- Think why is saveptr passed as a pointer to a pointer?
- Think what you'll need to pass in, and what will it give you back.
- Try drawing a picture.
-
Hint - Why a pointer-to-pointer?
It needs to be able to change where this pointer points, so we have to pass it the pointer using *pass-by-pointer*. Therefore, pass the address of the pointer; hence it's type ischar**
.
-
Hint - Return value
Read the man page aboutstrtok_r()
's return value to get the actual tokens. -
Hint - Loop
Design a loop so that you can process all tokens. Read the return value ofstrtok_r()
to decide how to stop the loop.
- Read its
3. Implement it!
- Write a little code at a time:
- Write o couple lines, compile, run, debug. Repeat!
- Compile with the address sanitizer so it displays errors on bad pointers when running:
clang lab1.c -fsanitize=address
- Run with
./a.out
-
Hint - Compile and Run
Combine compiling and running into one command line command using && (AND):
clang lab1.c -fsanitize=address && ./a.out
- Do simplest thing first (like print
"Hello world!"
). Then add functionality bit-by-bit.
Implementation Hints
-
Hint - _POSIX_C_SOURCE
Some functions (likegetline()
) need the_POSIX_C_SOURCE
feature test macro. Define this as the top line in your C file, such as:
#define _POSIX_C_SOURCE 200809L #include
#include <...> - Read in a string with
getline()
; print your buffer to the screen to ensure it is correct.getline()
can allocate a buffer for you, or resize your buffer if it's too small.- Since it can allocate memory, you pass it both the address of your buffer pointer, and the address of your variable storing the buffer size.
- Since it dynamically allocates memory, you need to
free()
that memory. -
Hint - Example code
char *buff = NULL; char size = 0; ssize_t num_char = getline(&buff, &size, stdin); ... // Use buff free(buff);
- Hints for
strtok_r()
:- Don't use plain
strtok()
: It is not thread-safe! This will cause problems in later programs. - Repeatedly call
strtok_r()
to get the next token.- On the first call, you'll need to pass in your full string:
char *saveptr; char *ret = strtok_r(buff, " ", &saveptr);
- On the second and subsequent calls, pass in NULL (
saveptr
stores it):char *ret2 = strtok_r(NULL, " ", &saveptr);
- On the first call, you'll need to pass in your full string:
- Don't use plain
- Add error handling
- Many functions in the C Standard Library (stdlib) return a negative number if they failed.
- Almost every function call should be checked for errors. If you get an error, call
perror()
and exit the program. -
Hint - Example code
char *buff = NULL; char size = 0; ssize_t num_char = getline(&buff, &size, stdin); if (num_char == -1) { perror("getline failed"); exit(EXIT_FAILURE); } ... free(buff);
- Comparing
getline()
return value vsn
argument:- Recall
getline()
's prototype:
ssize_t getline(char **lineptr, size_t *n, FILE *stream);
- The
n
argument is the size of the buffer. It is of typesize_t
because it must be 0 or more (unsigned). getline()
returns the number of characters read. It is of typessize_t
because it can be negative (for failure).- Pay attention to the difference between
size_t
andssize_t
(unsigned vs signed)!
- Recall
4. Reviewing
- During the last 15 minutes of each lab hour, TAs will show a sample solution.
- TAs will talk through how the solution works and discuss its implementation.
- Suggested topics to discuss:
- Where is memory allocated for
getline()
? - How does
strtok_r()
use memory? - Why are some parameters pointers, or pointers-to-pointers?
- How does the solution loop through all tokens?
- What are some good values to test the code with?
- Where is memory allocated for
5. [Optional] Challenges
You can try these if you like; however, we are not marking them! They are just for your own learning.
- Optional: Test with multiple spaces between tokens. Test with tabs. Can you explain why your program does this?
- Optional: Make your program loop to repeatedly ask the user for a new string and then tokenize it. Exit when user just presses enter.
- Optional: Move your tokenization code into a function. Pass the function the source string and the number of characters and have it print the tokens to the screen.
- Optional: Try and refactor your code to clean it up. Try only having one call to
strtok_r()
. Put it in a loop and manage its pointers correctly.
Submission
Submit your C code to CourSys; the file name must be an exact match to what CourSys is expecting, otherwise it won't accept it.
Submissions will be marked for completion. It must be valid C code that runs (however we are unlikely to actually compile and run the code). You do not need to complete any optional steps.
If you were working on a CSIL machine, delete your docker container: docker rm cmpt201
(Don't do this if you were on your own laptop!)