Memory Sizes

2019-11-14 21:51

Paul Aiton

Tags:

virtual-memory

Computer memory involves really big numbers. Uncomprehendingly big sometimes. To make sense of it, conventions have been made to make writing and thinking about these numbers easier for humans, with different conventions being most useful in different contexts.

BIT : Binary digit, a single unit of information that can be either 0 or 1, on or off. The transistors in computers are essentially tiny on-off switches, so everything in a computer is based on powers of 2 for economy sake; anything that is logically not a power of two requires composition of powers of two to make it work, often times requiring using the next highest power of two and wasting all the excess capacity.

BYTE : Historically, the unit of encoding for a single character of text in the computer's native encoding. Today it's ubiquitously defined as a group of 8 bits, or a number between 0 and 255, or 256 unique combinations if counting ordinally. The reason for 8 has no single authoritative answer, but was most likely because \(2^3\) is 8, and \(2^8\) is 256, which is the smallest power of 2 that can hold the entire English alphabet, numerals, and special characters found on the standard keyboard.

WORD : Historically, a word is a single unit of data that is the natural size for moving data between system RAM and CPU registers. On x86 and x86-64 systems, a word is defined as 16 bits (2 bytes,) because the 8086 CPU that started the x86 line had a 16 bit data bus, cpu registers, and instruction set. When the 80386 CPU was launched (1985) and expanded the architecture to 32 bit addressing, IBM-PC compatibles were still using 16 bit operating systems, until the first mainstream fully 32 bit OS was launched, OS/2 2.0 in 1992; Windows would not have native 32 bit support until Windows 95 which launched in, you guessed it, 1995. Even then, those systems still required legacy support, so a word remained defined as 16 bits, and "double-word" was defined for 32 bit data. When the 64 bit extension was introduced in 2003, "quad-word" was defined as moving 64 bits of data at a time.

NIBBLE : Only rarely encountered, but a nibble is half a byte (get it, a nibble .... a small byte ......). I've only ever heard it used in trivia, or once shortly after it being discussed in trivia, my office took to calling each hex digit a nibble for a couple hours.

PREFIXES

KILOBYTE : aka KB : \(2^{10}\), or 1,024 bytes.

MEGABYTE : aka MB : \(2^{20}\), or 1,048,576 bytes.

GIGABYTE : aka GB : \(2^{30}\), or 1,073,741,824 bytes.

TERABYTE : aka TB : \(2^{40}\), or 1,099,511,627,776 bytes.

PETABYTE : aka PB : \(2^{50}\), or 1,125,899,906,842,624 bytes.

It is important to note that, due to historical usage, these prefixes are the same names but different meaning than the SI prefixes, aka the Metric system, but only when dealing with memory sizes. When used for other things such as Gigahertz for cpu clock speed, gigabits per second for network speed, or other usages that are not sizes of memory, the prefixes indicate powers of 10, such as 1 GHz = \(10^{9}\) Hz = 1,000,000,000 cycles per second.

Numeral Systems

BINARY : A number system where each digit has 2 (bi) possible values, 0 or 1.

OCTAL : A number system where each digit has 8 (oct) possible values, 0 through 7.

DECIMAL : A number system where each digit has 10 (deci) possible values, 0 through 9. This is the normal English numerical system.

HEXADECIMAL : A number system where each digit has 16 (hex + deci) possible values, 0 through 9, and A through F. Colloquially called just "Hex".

If you want to understand advanced memory concepts, you will have to learn to read and convert binary and hexadecimal numbers. Octal has mostly fallen out of use except for a few rare instances where the information being encoded is in repeating groups of 3 bits. If it hasn't clicked yet, a single hexadecimal digit encodes 4 bits, and two hexadecimal digits encode a byte. 16, 32, and 64 bit numbers are encoded in hex numbers with 4, 8, and 16 digits respectively.

Chart

Dec |   Binary   | Hex
000 |  0000 0000 |  0x00
001 |  0000 0001 |  0x01
002 |  0000 0010 |  0x02
003 |  0000 0011 |  0x03
004 |  0000 0100 |  0x04
005 |  0000 0101 |  0x05
006 |  0000 0110 |  0x06
007 |  0000 0111 |  0x07
008 |  0000 1000 |  0x08
009 |  0000 1001 |  0x09
010 |  0000 1010 |  0x0A
011 |  0000 1011 |  0x0B
012 |  0000 1100 |  0x0C
013 |  0000 1101 |  0x0D
014 |  0000 1110 |  0x0E
015 |  0000 1111 |  0x0F
016 |  0001 0000 |  0x10
032 |  0010 0000 |  0x20
048 |  0011 0000 |  0x30
064 |  0100 0000 |  0x40
128 |  1000 0000 |  0x80
192 |  1100 0000 |  0xC0
255 |  1111 1111 |  0xFF
256 | 10000 0000 | 0x100
512 |   too many | 0x200
4096|    no way  | 0x1000

This is certainly not an exhaustive chart, and is meant more for a quick example of how binary relates to Hex. At a certain point using binary is too prone to errors due to how similar one zero or one looks next to the others beside it.

The important thing to recognize is that each hex digit encodes 4 binary digits, and if you HAVE to convert from hex to decimal recognize that just like 837 in decimal is encoding \(8(10^{2}) + 3(10^{1}) + 7(10^{0})\), the hex number 0xd36 is encoding \(13(16^2) + 3(16^1) + 6(16^0)\). (If you missed it, hex D is decimal 13.) Definitely not very intuitive or convenient, which is why few people go out of their way to convert hex to decimal by hand (use a programming calculator), but Binary to Hex is trivial because you can just break binary into groups of 4, and there's only 16 different combinations to decode. Or just 4 bits if it makes it easier to think of.

Pointers and Addresses

A pointer is an unsigned integer value (technically a natural number,) that encodes a memory address. I have never seen a pointer in anything other than hexadecimal notation, though the computer is storing it in transistors so the hex is for humans and there's no reason they couldn't be in another form. The two most common sizes you will encounter in modern machines will be 32 or 64 bits, but often the leading zeroes will be dropped for ease of use.

32-bit:

0x00000000 == NULL pointer
0x90d8c000 == random pointer that's page boundary aligned.
0x7F0500   == 0x007F0500  with leading zeroes dropped

Generally 32 bit pointers didn't need their leading zeroes dropped because it usually made little difference; the computer still encoded them, and there were never enough to matter.

64-bit:

0x0000000000000000 == NULL pointer
0x00007ffe7c043000 == random pointer that's also page boundary aligned.
0x7F0500           == 0x00000000007F0500

With 64 bits it makes a difference for readability when often more than half can be excluded.