Network Basics and Network Abstraction in Linux
Contents
Motivation
Before you learn the tools and commands for using the network in Linux you need a basic understanding of how networks work and this unit tries to bring you up to speed quickly.
The ISO-OSI 7 Layer Model
The 7 Layer Model is used to describe networks. The IP Protocol has not been developed within ISO and thus only roughly fits into the mode. Still it is a good picture to have in you mind when you think about networks.
On Top you have your applications. Like e.g. a Web-Browser. Below you need definitions on the details of how websites are encoded and transported via HTTP. And so on. And at the bottom we need specifications on how data is transmitted at the wire (or wireless): E.g. cable definitions, voltage levels, frequency, etc..
What we are looking at here is the Layer 2: That defines how data is encoded on a certain medium and in later units also layer3 (Routing - how packets are sent between networks).
Layer 2
From the abstraction in the operating system we basically have 2 different kind of physical medium:
- broadcast
- There is a local network where stations can send to each other and also there is a way to send to all stations on the network. Typically an ethernet network segment or a WiFi network.
- point-to-Point
- Two stations connected via a link and only those 2 stations can exchange data. Typically a dial-up connection, a network over a serial line or a virtual connection like a VPN tunnel.
The typical broadcast medium is ethernet
and most network interfaces are of this type. In ethernet you have a 6 byte address that is used to address each station on the network. This is the so called hardware address
or MAC Address
. It is usually written in the form of 12 hex digits grouped into bytes by colons. E.g.: b0:35:9f:2a:29:7d. Each network card should have a unique MAC address. The first digits are assigned to a company and the last digits are counted up in the factory. The address mentioned belongs to an intel card.
In the old days the ethernet was built with a coaxial-cable that connected all computers. Today ethernet is usually built with twisted pair cables and RJ45 connectors. The cables run to a central switch or hub that distributes the packages to all stations. A hub would distribute every packet to every station. A switch is more intelligent: It learns the MAC address of each station and only distributes packets to the computer that was address. Of course, broadcasts are always sent to all station on that segment.
Most of the time we want to send TCP/IP packages. Those are encoded as payload within the ethernet frame. Within the TCP/IP there could be e.g. an HTTP request.
With IP we are already moving to layer 3.
Network Abstraction in Linux
If we use a network card of a different vendor we do not want to rewrite all our programs. So the Linux system has drivers for all different network cards and once the right drivers are installed we do not have to care about the particularities of each card. We only see a network interface
. For the most part we also do not want to care about the details of sending packets, re-transmitting those that are lost, etc.. - we just want a connection to youtube.com to watch funny hamsters dancing. The Linux kernel provides most of the needed abstraction here:
On the bottom the Linux kernel has drivers for each type of card. Most of the protocol for Ethenet, IP, TCP are all handled in the kernel. The user programs connect via a standardized library (libc) that offers them convenient functions for opening network connections, where they only need to specify the destination IP address.
Of course we also need tools to configure the network. The abstraction from the Linux kernel gives us so called interfaces
. Most other hardware in Unix is typically abstracted as a device
that has a device file
below /dev. E.g. /dev/sda could be your hard-drive, while /dev/ttyUSB0 would be a serial port from a USB device. Network interfaces are different. They do not have device files but only interfaces. You can list the interfaces with:
$ ifconfig $ ip link $ ip addr
The ifconfig tool is actually depreciated because it does not support all the features of the Linux kernel anymore. The tool to use is ip
. The ip link shows all your network interfaces and what type they are. ip addr also shows the IP addressed.
Here is an output of both ifconfig and ip addr:
$ ifconfig ... wlp3s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 192.168.79.105 netmask 255.255.255.0 broadcast 192.168.79.255 inet6 fe80::290d:840f:e5a6:e72b prefixlen 64 scopeid 0x20<link> ether b0:35:9f:2a:09:9d txqueuelen 1000 (Ethernet) RX packets 210183 bytes 215596586 (205.6 MiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 91984 bytes 20003418 (19.0 MiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 $ ip addr ... 3: wlp3s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000 link/ether b0:35:9f:2a:09:9d brd ff:ff:ff:ff:ff:ff inet 192.168.9.105/24 brd 192.168.9.255 scope global dynamic noprefixroute wlp3s0 valid_lft 6556sec preferred_lft 6556sec inet6 fe80::290d:840f:e5a6:e72b/64 scope link noprefixroute valid_lft forever preferred_lft forever
Your output will have more interfaces but here I only show one interface. wlp3s0 which is actually my WiFi card. You see the MAC address and you see the IP addresses. In the ifconfig you also see that the number of incoming and outgoing packets and the number of bytes. You can see these statistics in ip with the -s option. E.g. ip -s addr
IPv4 vs IPv6
Most of the Internet sill uses IP Version4 (or IPv4 for short) with its 232 addresses written in the well known form of 4 decimal numbers separated by 3 dots. e.g: 192.168.92.113. For a long time Linux also supports IPv6 with its 128 bit addresses: Written as 8 groups of 16 bit numbers written in Hex and seperated by colons. E.g: 2001:0db8:85a3:0000:0000:8a2e:4370:c33a
Since most companies do not want a direct network connection to the outside but use a firewall where they can also run NAT (Network Address Translation), most company networks use private IP address inside and most of their infrastructure still runs on IPv4.
I will also focus on IPv4 here. Most tools in Linux also support IPv6. Either you have to append a 6 to the name of the tool or use the option -6.
IP over Ethernet
We learned that the local communication between computers on an ethernet is via ethernet frames that are addressed with MAC addresses and the TCIP/IP frames are the payload of the ethernet frames. But how does a station now which IP address on a local network and which MAC address belongs to which IP address?
The first part is via configuration. When an interface is set up we tell it which range of IP addresses belong to the local network. For this the CIDR (classless inter-domain routing) notation is used: E.g. we say: 192.168.1.0/24 is the network that should be found on a network interface. Which means that all addresses have the leading 24 bits as 192.168.1.xxx and the last 8 bits. (In this case this corresponds with the last number behind the dot) can all be found on the network.
CIDR
Where the boundary can be on any bit, but /24 is often used. E.g. We could define 10.11.12.128/25 where we would have all IPs from 10.11.12.128 to 10.11.12.255. Or we could have 192.168.99.64/29 where we would have the range 192.168.99.64 to 192.168.99.71, etc. When we have a /24 we call it a class-C, a /16 is a class-B network and a /8 is a class-A network. Everything as is classless
.
Instead of using the CIDR / notation, the network is often specified with the netmask. The netmask is an IP address with all bits set to 1 that are part of the network. So /8 corresponds with 255.255.255.0.
Here is a short table:
CIDR | netmask | class | example | number of addresses in the range |
---|---|---|---|---|
/24 | 255.255.255.0 | C | 192.168.3.0/24 | 256 |
/16 | 255.255.0.0 | B | 10.11.23.0.0/15 | 65536 |
/8 | 255.0.0.0 | A | 10.0.0.0/8 | 16777216 |
/23 | 255.255.128.0 | 192.168.40.0/23 | 512 | |
/27 | 255.255.255.224 | 192.168.0.64/27 | 32 | |
/0 | 0.0.0.0 | 0.0.0.0/0 | 4294967296. the entire internet | |
/32 | 255.255.255.255 | 10.11.12.13/32 | 1 one host only |
Try to find the netmask/CIDR information in the ifconfig/ip addr output from above.
The highest IP address in each network (with all 1-bits in the digits that belong to the net) is usually reserved as as broadcast address and should not be used for normal stations. E.g. 192.168.0.255 in a 192.168.0.0/24.
ARP
The second part of problem: So when we give an interface the address 192.168.99.17 and define a 192.168.99.0/24 network on that interface, how does the system find the MAC addresses of the other stations?
A part of the TCP/IP protocol is responsible for this. The ARP (Address Resolution Protocol). If you want to send to a station that should be on the local network (because it fits in the network range configured), then you have to send out a broadcast asking all stations to tell you if they have the IP and to answer with their MAC address.
A computer usually caches that information for 2 minutes and then asks again. So if that ever changes it will find out. Also a station can send out an ARP information on its own, a so called gratuitous ARP. This is useful if the information changes. Also remember, the switch needs to have a table where it knows which MAC addresses can be found on which port.
The tool in Linux to find out what is in the arp cache is called arp. Here is an example output:
(some tools like this are only useful for the root users. So run this as root. You can run it as a normal user as well but then you might need to give it the full path name. e.g. /sbin/arp or /usr/sbin/arp
$ arp $ /usr/sbin/arp Address HWtype HWaddress Flags Mask Iface 192.168.5.1 ether 18:d6:c7:f7:f3:2e C wlp3s0 192.168.5.201 ether 80:ee:73:81:a5:9e C wlp3s0
So here we see that the arp cache has 2 entries. One is the router and one is an other station on the network that we have communicated with in the last 2 minutes. You see the IP addres, the corresponding MAC address, and the network interface where this was found.
Loopback Interface
Each Linux (and windows) system has a, so called, lookback interface with the address 127.0.0.1. In fact the entire 127.0.0.0/8 is reserved for loopback. On this interface the computer can talk to itself. This is useful for programs that normally run network services, but in some cases should only be used by programs on your own computer. The name of the 127.0.0.1 should be localhost. In IPv6 the localhost is all zeros except the very last bit is 1: ::1
Private IP Space: RFC 1918
On your private network at home or in your company you often need networks that are not used in the public internet. For this you can use IP addresses from the ranges defined in RFC1918:
10.0.0.0 to 10.255.255.255 | 10.0.0.0/8 | could be divided into 65536 times /24 |
172.16.0.0 to 172.31.255.255 | 172.16.0.0/12 | could be divided into 1024 /24 networks |
192.168.0.0 to 192.168.255.255 | 192.168.0.0/16 | could be divided into 256 networks with /24 |
Exercises
- Find out which interfaces are on your computer. What is your own MAC address on each of the interfaces?
- Find out how many bytes and how many packets have been sent there
- Look at your arp cache. What is the MAC address of other stations? Try to use the ping IP-address to send packets to other stations and see if their MAC address shows up in the arp cache.