Data Abstraction

Data Structures as a Concept

Up to this point, you have been using primitive or fundamental data to solve your programming problems. These work well and they will still be used in the next topics. However, this data will be grouped together when there is a need to "package" it into something that represents a larger data quantity. An integer is a fine way to store someone's age but what do you do if you want to represent the whole person? Most people are made up of a variety things like two arms, two legs, two eyes, an age, a weight, a height, and so on. However, if you want to discuss a person in conversation, you don't say "this entity with two arms, two legs, . . ., etc."; you just say "this person". And even though you might not be aware of it, you have created an abstraction; you have represented a complex system with a symbol (i.e., the word "person") without mentioning all the details about that person aka abstraction.

Learning about abstraction will make up most of the rest of this reference, and the bonus is that it will help you think at a higher cognitive level. When you understand the concept of abstraction, you can manage data, and reality, at a much larger scale, and functionally process information at a higher level. And, so can your computer.

In computing, the term abstract data type or ADT is used frequently when data is grouped under the umbrella of some larger thing. A high school is not just a building, it is also classrooms, desks, teachers, students, hallways, and so on. However, if you are going to construct a high school abstraction, you will need to represent each part of the highschool with its own data quantity. The data could just be an integer say, for the number of classrooms, but it might also be another ADT representing a specific classroom.

The classroom ADT might have a number of chairs, a teacher's desk, some windows, and much more. So you can see this abstraction process can build on itself to the point that you could have an entire city ADT comprised of all the component ADTs such as houses, fire stations, hospitals, and so on.

In Computer Science we seek ways to make our world computable so we can solve problems or create solutions with our devices (and our intellect). Using ADTs is the largest player in the game of making things computable.

Defining a struct

The C programming language allows creation of an ADT by using a quantity called a struct. A struct is very simply a way to do what you just read about. Data can be organized under the umbrella or abstraction of a name. Start with a fairly simple abstraction called a student, which, for purposes of this discussion, has a fairly simple set of data components such as name, age, GPA, and student ID. A C struct designed to manage this abstraction would look like the following:

   struct Student
      {
       char name[ STD_STR_LEN ];
       int age;
       double GPA;
       long studentId;
      };

The above declares a data structure called Student. The definition does not actually declare a variable. Declaring a variable with a structure definition would look like the following:

   struct Student
      {
       char name[ STD_STR_LEN ];
       int age;
       double GPA;
       long studentId;
      } someStudent;

However, creating a variable with an ADT definition is somewhat self-defeating so this form of variable declaration will not be used. The idea of creating a data structure is to be able to use it throughout your program, as a data type. Think about that; you now have the ability to create your own data types! However, defining variables at the same time a data type is being defined is not a good practice, and again, will not be used in this reference.

That is all there is to creating a struct. You figure out what data will be necessary to make up your ADT, and you place them in the struct. As you will soon see, these values are now considered to be members of this data type. The next thing to look at is how to use it.

Using a struct, with Dot Operators

struct data types will commonly be defined in header files, but could be defined in the area above the main function although that is not common. Once the struct has been defined, it can be used to create data types in a program. Consider the following code that uses the Student struct defined above.

   int main()
      {
       // initialize function/variables
       struct Student firstStudent, secondStudent;
       double averageGPA;

       // set first student data
          function: strcpy
       strcpy( firstStudent.name, "Sally" );
       firstStudent.age = 19;
       firstStudent.GPA = 3.46;
       firstStudent.studentId = 654321;

       // set second student data
          // function: strcpy
       strcpy( secondStudent.name, "Ron" );
       secondStudent.age = 18;
       secondStudent.GPA = 3.44;
       secondStudent.studentId = 765432;

       // print data for both students
          // function: printf
       printf( "First student\'s name is %s, and her age is %d\n", 
                                    firstStudent.name, firstStudent.age );

       printf( "Second student\'s GPA is %4.2f, and his student ID is: %ld\n",
                                   secondStudent.GPA, secondStudent.studentId );

       // calculate average gpa
          // function: calcAvgGPA
       averageGPA = calcAvgGPA( firstStudent.GPA, secondStudent.GPA );

       // display average gpa
          // function: printf
       printf( "The average of their GPAs is: %4.2f\n", averageGPA );

       // return success
       return 0;
      }

As can be observed, the dot operator (e.g., a period) is used to show that a value is a member of a given struct item. Other than that, there really isn't any other difference assigning the data to or from something, and passing the values as parameters.

Using a struct, with Pointers

Using dot operators is pretty straight forward and not terribly complicated. However, a point will come when you need to work with pointers to data structures. It turns out that is not terribly complicated either. Consider the same program using dynamically allocated data and pointers.

   int main()
      {
       // initialize function/variables
       struct Student *firstStudent
                      = (struct Student *)malloc( sizeof( struct Student ) );
       struct Student *secondStudent
                      = (struct Student *)malloc( sizeof( struct Student ) );
       double averageGPA;

       // set first student data
          // function: strcpy
       strcpy( firstStudent->name, "Sally" );
       firstStudent->age = 19;
       firstStudent->GPA = 3.46;
       firstStudent->studentId = 654321;

        // set second student data
          // function: strcpy
       strcpy( secondStudent->name, "Ron" );
       secondStudent->age = 18;
       secondStudent->GPA = 3.44;
       secondStudent->studentId = 765432;

       // display data for both students
          // function: printf
       printf( "First student\'s name is %s, and her age is %d\n", 
                                       firstStudent->name, firstStudent->age );

       printf( "Second student\'s GPA is %4.2f, and his student ID is: %ld\n",
                                  secondStudent->GPA, secondStudent->studentId );

       // calculate average gpa
          // function: calcAvgGPA
       averageGPA = calcAvgGPA( firstStudent->GPA, secondStudent->GPA );

    // display average gpa
          // function: printf
       printf( "The average of their GPAs is: %4.2f\n", averageGPA );

       // return success
       return 0;
      }

Not bad. Here are some take aways for the code above.

The operator for pointing at member values in dynamically allocated data ("->") is called the arrow operator because, well, it looks like an arrow. The dynamic memory allocation looks the same as before, but now you have to tell it to use a struct Student * pointer when you cast, and you have to give sizeof( struct Student ) so it will allocate the correct amount of data. Since you are already famliar with casting for, and getting the size of primitives, this should be intuitive.

And to review, arrow operators will always be used with dynamically allocated structs and dot operators will be used with structs that are declared and local to the given function, but they are used in exactly the same place to identify the struct value and its member.

One More Thing, To Make it Easier

If you noticed that every time you wanted to provide the struct name (e.g., Student), you had to precede it with the word struct). This is standard, but a little bit kludgy. There is a way to clean that up and truly make your data type look like a data type when you use it. It involves a keyword that tells the compiler that a given struct or other data structure is in fact, a type. And not surprisingly, since it is telling the compiler to "define this thing as a type", the keyword is typedef. The following is how you would put this into play.

   typedef struct structStudent
      {
       char name[ STD_STR_LEN ];
       int age;
       double GPA;
       long studentId;
      } Student;

It is kind of a glitch that the first name provided structStudent will still work if a programmer places the keyword struct in front of it but that will be ignored for purposes of the greater good related to reducing the amount of extra code needed in programs. The type name Student will act as a stand-alone data type so that is what you will use in your code after doing this. The following code will now work without all the keyword struct everywhere. Consider the following updated usage.

    // initialize function/variables
    Student *firstStudent 
                   = ( Student *)malloc( sizeof( Student ) );
    Student *secondStudent 
                   = ( Student *)malloc( sizeof( Student ) );

All three places on each initialization line that previously had struct Student now simply have the data type name Student. This will up the readability and lower the amount of code.

Wrapping Up structs - But Actually Not Really

This topic has provided an introduction to both the concept and implementation of the simplest of data structures, a record, or as known in C, a struct. However, this is only the beginning of your interaction with this tool. The struct will end up being a significant part of many of the data structures that will be presented in the next few chapters. In fact, as you transition into Object Oriented Programming with the C++ programming language, you will still be using the struct. Stay Tuned.

Watch this video to see how structs are used.

Section 14c