Introduction to Data Abstraction
This topic will cover how we represent data in a program. The process itself is pretty simple, but there are some complicating factors that should be discussed. To almost every computer in general use today, data is exactly the same as the commands; it is a series of 1's and 0's. However, it is difficult for humans to manipulate huge amounts of any kind of digits, so programming languages offer us the ability to manage the data in more human-friendly ways. Learning about these ways will be our next steps.
Using Data in a Program
Abstraction? As mentioned above, computers have no problems holding data. A computer for example might hold the data value 19 as 00000000 00010011, and it might hold that data value at a memory location numbered 26592 in decimal, which is what we are familiar with, or 01100111 11100000 in binary, as the computer would understand it, or 67E0 in hexadecimal, as it is commonly displayed for programmers.
Note that computers may show these numbers differently since they sometimes swap the first part of the binary number with the second part, but that is not a critical part of our discussion right now. In point of fact, just looking at the numbers the way they are is awfully messy, if not confusing. That is the bad news. The good news is that most programming languages do not make you use the computer level data management; they cover up the messy stuff and make the data holding process much easier.
The "covering up" process is called abstraction which is the process of using one thing that represents another, but does not require you to worry about the smaller details of the represented thing. For purposes of the data, the programming language allows you to create your own name for the data that you want to hold. The name that represents data in any given program is called a variable which is a kind of identifier in the C programming language.
Identifiers. First, an identifier is any contiguous, or uninterrupted (i.e., no dividing spaces) combination of characters and digits that are used to represent various parts of a program; the identifiers must start with a letter or an underscore (e.g., myVariable or _variable). However, even though it is allowed, it is not a good idea to use the underscore for variable names since compiler developers commonly use variables with a leading underscore or two, and you don’t want to conflict with the software variables that are used to translate your code into computer language. Besides you have plenty of options with variables starting with letters.
When creating identifiers, you must use character or word combinations that clearly specify what data the variable will hold. Many times this means a kind of “multi-word” identifier such as myPayRate or hisAge for variables, and calcDiscriminant or findMaxValue when you begin creating your own functions. Notice that identifiers are used for programming components other than variables. Also notice that these identifiers start with a lower case letter and use an upper case letter to identify the start of a new word in the group. This follows a convention called camel case identifier creation, for what should be obvious reasons.
The one more thing to note from this discussion is that if the identifier is used to name a “thing”, which is usually a variable, the identifier should be a noun; however, if the identifier is used to describe an action, such as that conducted when you create a function, the identifier should start with a verb. Note the identifiers used previously; others might be named convertString, or isEven, or getRandomValue. All of these are names that might be used for functions since they start with verbs.
Watch this video to get a better understanding of what identifiers are used for, and can do.
Variables. To continue with identifiers and complete the story on variables, a variable is an identifier that is used to represent, and cover up, all the messy memory stuff mentioned previously, and simply hold data for you.
Examples of data held by variables:
a variable named myName might hold the value "bill"
a variable named hisAge might hold the integer value 17
a variable named balance might hold the floating point value 23.44
a variable named file2 might hold the name of a text file, such as "myfile.txt"
a variable named middleInitial might hold the first letter of your middle name, such as ‘A’
As you can see, there are different kinds of data. However, they have one thing in common. All variables are like buckets that can hold a certain specified kind of data. You can put any value in the bucket as long as it is the appropriate kind of data such as a character, a floating point value, an integer, and so on. The bonus is that you can label the bucket so that when you go back to the barn (or your program) to get some value that you stored, you will be able to get the right value if you know what the bucket's (variable's) name is.
Primitive or fundamental data. Since there is more than one kind of data type, the computer program must be told which kind of data is to be held by each variable. For that reason, you must define each variable that you are to use in the program. In C, the primitive data types are integer, represented by int, floating point, represented by double, and character, represented by char. The data type Boolean, represented by bool can also be used and will be treated as a fundamental type, however it is not actually fundamental to C and will be supported by including a library called stdbool.h. You will learn more about these data types in the next few sections.