Data Representation

Data Representation in Computing

Data Representation refers to the form in which data is stored, processed and transmitted. Data representation is the process of encoding data in a format that is understandable and usable by computers. In this section, I will explain various aspects of data representation, including data types, units of storage, character representation, coding information using bit patterns, binary arithmetic, and conversions between different number systems.

Data Types

Data types in computing defines the nature of data stored in a computer that tells it how to interpret its value and dictate the operations that can be performed on them. Understanding data types ensures for efficient memory usage and accurate data manipulation.

Identifying Data Types:

  1. INTEGER: Represents whole numbers whether it’s positive, negative or zero.

Examples:

  • Positive Integer: 100
  • Negative Integer: -42
  • Zero: 0
  • Large Integer: 987654321
  • Small Integer: -3
  1. FLOATING-POINT: Represents decimal numbers.

Examples:

  • Positive Float: 12
  • Negative Float: -0.001
  • Scientific Notation: 022e23
  • Fractional Float: 5
  • Large Float: 12345
  1. STRING: Represents a sequence of characters.

Examples:

  • Name: “Ibrahim Faal”
  • Address: “123 Street, Banjul”
  • Email: “example@email.com”
  • Message: “Hello, how are you?”
  1. CHARACTER: Represents individual characters.

Examples:

  • Alphabets: ‘A’, ‘b’, ‘Z’
  • Numbers: ‘0’, ‘7’, ‘9’
  • Special Characters: ‘@’, ‘$’, ‘&’
  1. BOOLEAN: Represents True or False values.

Examples:

  • True: True
  • False: False
  • Result of Comparison: 5 > 3 (True), 10 == 5 (False)

Units of Storage

All data in computers are stored in bits (0 or 1), either in RAM or Disk. Whether it’s a number or letter, it gets translated into bits for storage. We use decimal numbers (0,1,2,3,4,5,6,7,8,9) which is base 10 in our everyday number usage but the computer uses 0s and 1s which is base 2.

Units of storage measure the capacity of data storage devices and memory. This units include bits, bytes, kilobytes, megabytes, gigabytes, terabytes etc. Understanding storage units helps in managing and estimating storage requirements for data.

  • Bit (Binary Digit): The smallest unit of storage, representing two states (0 or 1).
  • Nibble: consists of 4 bits (half-byte).
  • Byte: Consists of 8 bits. Commonly used for storing a single keyboard character.
  • Kilobyte (KB): represents 1,024 bytes.
  • Megabyte (MB): represents 1,024 KB.
  • Gigabyte (GB): represents 1,024 MB.
  • Terabyte (TB): represents 1,024 GB.
  • Petabyte (PB): represents 1,024 TB.
  • Exabyte (EB): represents 1,024 PB.
  • Zettabyte (ZB): represents 1,024 EB.
  • Yottabyte (YB): represents 1,024 ZB.

Character Representation

What appears to you as text on the screen is actually stored as numeric values. Your computer translates the numeric values into visible characters. It does this is by using an encoding standard. Characters in computers are represented using numerical codes. Each character is assigned a numerical value (usually represented by binary digits) that can be stored or transmitted. This helps computers to interpret and display characters. Characters are encoded using schemes such as ASCII (American Standard Code for Information Interchange) and Unicode.

When typing a message on your computer, each letter, number, or symbol you type is represented by a unique code. For instance, the letter ‘A’ in uppercase is represented by the number 65 and letter ‘a’ in lowercase is represented by the number 97. This allows computers to understand and display text, numbers and special characters on screen.

Characters include:

  • Letters (both uppercase and lowercase)
  • Numerical digits (0-9).
  • Punctuation marks like periods (.), hyphen (-) or commas (,) are also considered characters.
  • Special Characters like ‘@’, ‘#’, ‘%’, ‘^’, ‘&’, ‘*’.
  • Whitespace like creating space using the spacebar and tab stops are also considered characters.

Character Encoding Standards

Characters are encoded using schemes such as ASCII (American Standard Code for Information Interchange) and Unicode.

ASCII (American Standard Code for Information Interchange):

ASCII is one of the earliest and simplest character encoding standards. It assigns a unique number from 0 to 127 to each character, which means it can support up to 128 different characters including letters, numbers, punctuation marks, and control characters (non-printable characters). ASCII uses 7 bits (64 32 16 8 4 2 1), in other words the binary numbers representing these numbers have 7 bits to them. Extended ASCII uses 8 bits for each character and provide codes for 256 characters.

Example:

  • Letter ‘A’ is represented by the ASCII code 65.
  • Number ‘5’ is represented by the ASCII code 53.
  • Symbol ‘$’ is represented by the ASCII code 36.
  • Space is represented by the ASCII code 32.

How ASCII Encoding Works:

  • When you type a character on your keyboard (let’s say ‘A’), your computer translates it into its corresponding ASCII code (65 in this case) behind the scenes.
  • Similarly, when you see a character on your screen, your computer retrieves its ASCII code and displays the corresponding character.

Here is an example:

Let’s say I want to type my name “Ousman Faal” in a computer. This text contains 10 letters and 1 space making it 11 characters. The table below shows the numerical values (ASCII code) and also the binary representation of the characters in my name.

Character ASCII Code Binary
O 79 1001111
u 117 1110101
s 115 1110011
m 109 1101101
a 97 1100001
n 110 1101110
space 32 0100000
F 70 1000110
a 97 1100001
a 97 1100001
l 108 1101100

Unicode:

Unicode is a comprehensive character encoding system that supersedes many earlier encodings. Supports over 143,000 characters and continues to expand with each new version. It includes characters from multiple languages, symbols, emojis, and special characters. Unicode uses 8, 16 and 32 bits unlike ASCII that uses 7 or 8 bits.

Example:

  • Letter ‘A’ in Unicode is represented by the code U+0041.
  • Chinese character ‘好’ (meaning ‘good’) is represented by the code U+597D.
  • Emoji ‘????’ (smiling face) is represented by the code U+1F60A.

How Unicode Encoding Works:

  • Unicode uses hexadecimal notation, which means characters are represented by a combination of numbers and letters (0-9 and A-F).
  • For example, the Unicode code for ‘A’ is U+0041. Here, ‘0041’ is the hexadecimal representation of the ASCII code for ‘A’ (65 in decimal).
  • Conversion between Unicode and ASCII involves converting between different numbering systems and looking up the corresponding characters in the respective encoding tables.

Unicode formats include:

  • UTF-8: Uses 1 byte to represent characters (widely used on the web).
  • UTF-16: Uses 2 bytes to represent characters.
  • UTF-32: Uses 4 bytes to represent characters.

Coding Information Using Bit Patterns

Information is coded using bit patterns, where each bit represents a binary value of either 0 or 1. Coding schemes such as binary, octal, and hexadecimal are used to represent data in a compact and efficient manner. Understanding bit patterns is essential for data transmission, storage, and manipulation.

Below is an example of an 8-bit pattern. Remember a group of 8-bits is referred to as a byte. This binary pattern can create a combination of series of 0s and 1s of up to 256 patterns. If all 8-bits have a zero (0) that will equal to 0 and if all 8-bits have a One (1) that will equal to 256.

128 64 32 16 8 4 2 1
0 0 0 0 0 0 0 0
128 64 32 16 8 4 2 1
1 1 1 1 1 1 1 1

Number Systems 

Decimal (Base-10) – The decimal system is the standard system for denoting integer and non-integer numbers. It is also known as the base-10 numeral system and uses ten different digits, which are 0 to 9. Each position in a decimal number represents a power of 10, with the rightmost digit representing (10^0), the next digit to the left representing (10^1), and so on.

Binary (Base-2) – The binary system is used internally by almost all modern computers and computer-based devices because it is straightforward to implement with digital electronic circuitry. The binary system uses only two digits, 0 and 1, and each position in a binary number represents a power of 2, with the rightmost digit representing (2^0), the next digit to the left representing (2^1), and so on.

Octal (Base-8) – The octal number system uses eight digits, ranging from 0 to 7. It is sometimes used in computing as a more compact representation of binary numbers because it is easier for humans to read. Each position in an octal number represents a power of 8.

Hexadecimal (Base-16) The hexadecimal system is another number system of interest in computing. It uses sixteen distinct symbols, which are the numeric digits 0 to 9 and the letters A to F to represent values zero to fifteen. Each position in a hexadecimal number represents a power of 16.

These systems are essential in various fields, especially computing and digital electronics, where binary is used for processor-level operations, octal can simplify binary representation, and hexadecimal is often used in programming to represent memory addresses.