CS50 PSet4: Recover

A guide to the ‘recover’ problem in CS50 Week 4.

Goal: To write a program in C that can recover JPEG images from a forensic file.

The program must accept one and only one command line argument, the name of the file the images will be recovered from.

The program should output each of the JPEG images recovered as a separate file. There are 50 in total to be recovered from the card.raw file provided.

The first step is to import the correct libraries and define the BYTE struct, as recommended in the problem description for storing one byte of data. The uint_8 data type contains 8 bits, or one byte.

#include <stdio.h>

Now we can begin our main() function, taking the argc and argv arguments representing the number of command line arguments and the content of the command line arguments respectively. As always when handling command line arguments, the first step is to check the correct amount have been specified, returning an error code and a proper usage message if not. For this program we want argc to equal two, as the calling of the program itself counts as one argument.

int main(int argc, char *argv[])

If the command line argument check passes, we can continue and open the card.raw file. We can declare a string variable using char * for the input file name for readability purposes, rather than calling argv[1] throughout. The fopen() function should be used to open the input file in read mode (“r”), with the output assigned to a FILE * pointer which will point to the start of the file. Another error should be handled should nothing be returned from fopen().

// Open card.raw
char *input_file_name = argv[1];
FILE *input_pointer = fopen(input_file_name, "r");

if (input_pointer == NULL)
printf("Error: cannot open %s\n", input_file_name);
return 2;

With the input file opened, we can begin to process its contents. Before this however, some variables must be declared.

buffer is an array of 512 BYTES, allowing them all to be read in at once for efficiency purposes.

count keeps track of the number of images recovered so far.

img_pointer is a pointer to the file that will be written to.

filename will store the name of each of the JPEGs generated.

        // Initialise variables
BYTE buffer[512];
int count = 0;
FILE *img_pointer = NULL;
char filename[8];

Now for the portion of the script that will recover the images. I have used a while loop here that will continue while there are still bytes to be read. The fread() function takes four arguments, the contents of which are described better than I ever could here. Essentially it reads values from input_pointer and writes them to buffer. It also returns the number of elements that have been successfully read. If this number varies from the third argument, 1 in this case, then it signifies the end of file has been reached and the loop can terminate. The first two arguments represent an address to where the data to be read can be stored and the size in bytes of each element to be read.

With the first 512 bytes written to buffer, a check can be performed to see if a new JPEG file is present. Bitwise manipulation is used to check for the 4th digit, as instructed by Brian in the walkthrough video. If this passes, the previous JPEG can be closed and a new one opened to write to. Here sprintf() can be used to give the file an appropriate name.

Whether it is the start of a new JPEG or not, the next step is to write the 512 byte block to the currently open file, a reference to which is stored in img_pointer. This can be done using fwrite(), which functions very similarly to fread().

Once the entire input file has been scanned through, the last step is to close the currently open files and return 0 for a successful program.

        // Repeat until end of card:
while (fread(&buffer, 512, 1, input_pointer) == 1)
// If start of a new JPEG (0xff 0xd8 0xff 0xe*):
if (buffer[0] == 0xff && buffer[1] == 0xd8 && buffer[2] == 0xff && (buffer[3] & 0xf0) == 0xe0)
// If not first JPEG, close previous
if (!(count == 0))

Hopefully after that one you are now more familiar with some tricky concepts such as pointers and hexadecimal notation, as well as reading from and writing to files.

Again this was a satisfying one as you get a tangible output of images at the end!