Getting All The Data From The File

Due to its fairly low-level approach to data management, C does not do a very good job of managing data input from files. However, you can make your file operations work just fine. There are three strategies for this. Each has its own usage and benefits.

One Way: Using a Sentinel

You can use sentinel values to identify the ends of each line in your text files, or to indicate the end of a text file. For example, if you had a file with some rows of numbers that had to be uploaded, you could output some unusual value (i.e., a value that would not be expected to be used in that particular data set, called a sentinel) at the end of your data, and before the last endline so that when the data is reacquired, the input process will know when the data from each line has been uploaded. Your original output process might look like the following.

// header file
#include "File_Output_Utility.h"

// global constant
const int SENTINEL_VALUE = -1;

// main function
int main()
   {
    // initialize function/variables
    char fileName[] = "sentineldata.txt";
    int number, numToOutput = 5;

    // open input file, test for success
       // function: openOutputFile
    if( openOutputFile( fileName ) )
       {
        // loop across five values
        for( number = 1; number <= numToOutput; number++ )
           {
            // output the number with a space between
               // function: writeCharacterToFile, writeIntegerToFile
            writeCharacterToFile( SPACE );
            writeIntegerToFile( number );
           }
        // end loop

        // output a sentinal value after the data 
           // function: writeChar, writeInt
        writeCharacterToFile( SPACE );

        writeIntegerToFile( SENTINEL_VALUE );

        writeEndlineToFile();

        // close file
           // function: closeOutputFile
        closeOutputFile();
       }

       // otherwise, assume file open failed
       else
        {
         // display failed output
            // function: println
         printf( "Output operation failed.\n" );
        }

    // return success
    return 0;
   }

Note that the program compiling/building command for this would be:

gcc -Wall testprog.c File_Input_Utility.c File_Output_Utility.c -o test.exe

Watch this video to see the above code used as a program for downloading with a sentinel. The following would be found in the file sentineldata.txt:

1 2 3 4 5 -1

The recovery code would then look like the following.

// header file
#include "File_Input_Utility.h"

// global constants
const int SENTINEL_VALUE = -1;

// main program
int main()
   {
    // initialize function, variables
    char fileName[] = "sentineldata.txt";
    int number; 

    // open input file, test for success
       // function: openInputFile
    if( openInputFile( fileName ) )
       {
        // read prime for loop
           // function: readIntegerFromFile
        number = readIntegerFromFile();

        // loop until sentinel found
        while( number != SENTINEL_VALUE )
           {
            // display value found
               // function: println
            printf( "Integer value: %d\n", number );

            // get the next value
               // function: readIntegerFromFile
            number = readIntegerFromFile();
           
           }
        // end loop

        // close input file
           // function: closeInputFile
        closeInputFile();
       }

    // otherwise, assume file open failure
    else
       {
        display file open failure
           // function: printf
        printf( "File not found\n" );
       }

    // return success
    return 0;
   }

It is important to note here that it is not a good practice to input and output data in the same function or operation. This is another way to break modularity. And very soon, you will be using arrays which means you will be able to input the data in one operation or function, and output the data in a separate operation or function. Keep this in mind but for right now until you have studied arrays, you will have to work with reduced modularity conditions.

Watch this video to see the above code used in a program.

A Second Way: Using Read Priming

The sentinel operation can be handy but once you use the read priming process, you will find that there is a simpler way to check for, and input, data. Consider the following data.

1 2 3

Note that there is no sentinel value here, just three data items (i.e., numbers). Now consider the following code.

This first code is flawed. But the flaw is not obvious. Take a look.

// header file
#include "File_Input_Utility.h"

// global constants
   // none

// main rogram
int main()
   {
    char fileName[] = "numbervals.txt";
    int number;

    // open input file, check for success
       // function: openInputFile
    if( openInputFile( fileName ) )
       {
        // loop until file operation fails
           // function: checkForEndOfFile
        while( !checkForEndOfInputFile() )
           {
            // get input value
               // function: readIntegerFromFile
            number = readIntegerFromFile();

            // display value found
               // function: printf
            printf( "Integer value: %d\n", number );
           }
        // end loop

        // close file
           // function: closeInputFile
        closeInputFile();
       }

    // otherwise, assume file open failure
    else
       {
        // display file open failure
           // function: printf
        printf( "File not found\n" );
       }

    // return success
    return 0;
   }

The failure here is not obvious because it looks like the program would do everything correctly. However, what will actually happen is that the function will get the first number and, without testing for file input failure, display it, then it will get the next number and display it, and then get the third number and display it. Now having acquired the third number, the loop should stop. However, it has no reason to. In almost any text file, there should be at least one end line in the file before the end of the file. In fact, the file output utility guarantees that, and most other code should as well.

So, the loop continues into the next access operation. This time when it reaches for the data item, it is not there, and the checkForEndOfFile will now report true. However, this test is not done here at the point between getting the data and outputting it. Thus, the program will output incorrect data. Different systems may do different things. The file input utility operations will return a zero character, zero value, or empty string (depending on which acquisition function is used), but others may not change the number at all. Either the last number in the series will be incorrectly output as zero or it will be the same as the third value. Then, when the program cycles back to the while loop and checkForEndOfFile() test, it will discover the failure and stop the loop . . . one iteration too late.

This does not look very different from the previous loop, but the difference is significant.

// header file
#include "File_Input_Utility.h"

// global constants
   // none

// main rogram
int main()
   {
    char fileName[] = "numbervals.txt";
    int number;

    // open input file, check for success
       // function: openInputFile
    if( openInputFile( fileName ) )
       {
        // get first value - read prime
           // function: readIntegerFromFile
        number = readIntegerFromFile();

        // loop until file operation fails
           // function: checkForEndOfFile
        while( !checkForEndOfInputFile() )
           {
            // value verified, so it can be displayed
               // function: printf
            printf( "Integer value: %d\n", number );

            // get input value
            // last operation prior to test in while loop
               // function: readIntegerFromFile
            number = readIntegerFromFile();
            }
        // end loop

        // close file
           // function: closeInputFile
        closeInputFile();
       }

    // otherwise, assume file open failure
    else
       {
        // display file open failure
           // function: printf
        printf( "File not found\n" );
       }

    // return success
    return 0;
   }

The checkForEndOfFile() test cannot look ahead to find out if it is at the end of the file; it can only report if the file stream object’s most recent action was a success or a failure. This backward-looking operation works okay as long as you follow a single rule: When you don't know how much data is in a file, always test all input files for "not at end of file" before using the data you last attempted to input. In this case, it means every time you "reach" for input from a file, you should do nothing else until you test the input file object for checkForEndOfFile(). This is an important rule. If you do not follow it, you will not get the correct data into your program.

To repeat, when inputting unknown quantities of data, if you do not prime your loop before starting it, and/or you do not “re-prime” your loop by inputting the data at the end of the loop, data access failure is very likely. You must prime your loop before starting it, and you must “re-prime” your loop with an input action last thing in the loop block.

Watch this video to see the failure of this process, and then watch this video to see the right way to do it. It is important to understand the failure of not read priming before starting the loop so that you do not make this mistake in the future.

A Third Way: Using File Header Information

This third form of managing unknown input quantities is the best because it does not rely on previously located sentinel values or finding the end of the file. Either of these two ways is acceptable but a third way to do this is to place a number at the beginning of the file indicating how many items there are in the line, or possibly how many rows of items there are in the file, or some combination of these. Again, this means your program will rely on previously output data, but if this is done carefully, it can work fine. Here is an example of a text file with the data, and what is called the header part of the file (the first line shown below).

number of items: 17

65, 42, 19, 55, 57,

22, 14, 98, 33, 15,

27, 11, 44, 13, 18,

85, 93

The code to upload or read this data could be written as follows:

// header file
#include "File_Input_Utility.h"

// global variables
const char COLON_CHAR = ':';

// main program
int main()
   {
    // initialize function, variables
    char fileName[] = "testfile.txt";
    char inputString[ MAX_STR_LEN ];
    int counter, number, numberOfItems; 

    // open the file, check for success
       // function: openInputFile
    if( openInputFile( fileName ) )
        {
         // read in the "number of items" string, ignore
            // function: readStringToDelimiterFromFile
         readStringToDelimiterFromFile( COLON_CHAR, inputString );

         // read in the number of items in the loop
            // function: readIntegerFromFile
         numberOfItems = readIntegerFromFile(); 

         // loop across given number of items
         for( counter = 0; counter < numberOfItems; counter++ )
            {
             // get a value
                // function: readIntegerFromFile
             number = readIntegerFromFile();

             // capture and consume comma from data stream
                // function: readCharacterFromFile
             readCharacterFromFile();

             // display value found
                // function: printf
             printf( "Integer value: %d\n", number );
            }

         // close file
            // function: closeInputFile
         closeInputFile();

        }

    // otherwise, assume file open failure
    else
       {
        // display file open failure
           // function: printf
        printf( "File nnnnnnot found\n" );
       }

    // return success
    return 0;
   }

Once the function gets the number of items, it can use that value to input exactly the correct number of subsequent values and it does that within the loop. Notice that the comma input is not assigned to a variable. Since there is no need to collect the comma data, these must be pulled out of the stream but they are not used. For the integer in this case, the readIntegerFromFile function will continue to accept the data until the loop ends.

The above code acquires the header string "number of items" using the readStringToDelimiterFromFile function. Note that the function call does not return the actual string. Like the commas, the string that is captured is not used but it must be initialized and placed as a parameter anyway.

A Side Note about String Input

Notice that this readStringToDelimiterFromFile function uses a parameter. There are three readString... functions available in the file input utilities. One of them -- the one that was used -- gets a a string up to some delimiter. Remembering that a delimiter is just some kind of character indicator that divides data, this could be a period, a comma, a semicolon, or in this case, a colon. When you look at the text data you notice that it ends with a colon so this is easy. You just tell your program to get all the text it can find up to a colon by calling readStringToDelimiterFromFile with a colon parameter. So all of the text "number of items" was captured by the function (but again not used).

On the other hand, sometimes you just want to get the text a word at a time. The other readStringSegmentFromFile gets any text up to a space and since it is always the same delimiter, there is no need to place this as a parameter; you just use readStringSegmentFromFile(). It turns out that this function just calls the other one with a "space" parameter but again that keeps you from having to type that in. For the string "number of items", the readStringSegmentFromFile function would return "number". This is another example of both code reuse and making tools that are adaptable to different operations

Watch this video to see an example of this kind of input.

File I/O, Uncomplicated

The past few topics have provided you with the tools for storing data to, and acquiring data from, text files, most commonly applied to hard drive access. You should use the first of these topics to become familiar with the concepts of file I/O, and the next couple of topics provide you with some specific code examples, issues, and operations so you can put them to work in your own code.

Since you have been using this input and output structure for most of the time you have been programming, this new information is something to extend and expand on instead of having to learn something completely brand new. The next topic will expand on how you can acquire strings from text files, so it is again just an extension on what you have already learned.