6.5810: Lab 1

Apply for a Cloudlab Account

Apply for an account using the signup link. On the “Person Information” panel, click “Join Existing Project” and fill in our project name “MIT6828”. Your application will be approved by us shortly.

Setting up the experiment environment in Cloudlab

We are going to use m510 machines equipped with Eight-core Intel Xeon D-1548 2.0 GHz CPU, 64GB ECC Memory, 256 GB NVMe flash storage, and Dual-port Mellanox ConnectX-3 10 Gbps NIC. You can find the detailed hardware description and how the machines are interconnected here and current availability here. Based on our experience, m510 machines have ample availability most of the time but start the assignment early to avoid missing the deadline due to the availability issue.

  1. Make a profile containing two “raw-pc” (bare metal PC) m510 machines with the disk image “UBUNTU20-64-STD”, connected by a link. In the topology editor, you can create the link by dragging a line between the machines (called nodes in the editor). Click on the link and disable "allow interswitch mapping" to force the machines to share the same physical switch. Instantiate the experiment using the profile you just created.

Mellanox ConnectX-3 requires MLX4 poll mode drive library (librte_pmd_mlx4) to poll the packets directly from the NIC with DPDK. See the detailed document here. To enable the mlx4 driver, you first need to install the Mellanox OFED (OpenFabrics Enterprise Distribution) on the machines.

  1. Install dependencies.
    $ sudo apt-get update
    $ sudo apt-get install meson python3-pyelftools
    
  2. Download the Mellanox OFED for Ubuntu 20.04 x86_64.
    $ wget https://content.mellanox.com/ofed/MLNX_OFED-4.9-5.1.0.0/MLNX_OFED_LINUX-4.9-5.1.0.0-ubuntu20.04-x86_64.tgz
    
  3. Install the Mellanox OFED (this will take a full 10 minutes or more). You might encounter a few harmless error messages like missing packages and failing to update the firmware.
    $ tar -xvzf MLNX_OFED_LINUX-4.9-5.1.0.0-ubuntu20.04-x86_64.tgz
    $ cd MLNX_OFED_LINUX-4.9-5.1.0.0-ubuntu20.04-x86_64
    $ sudo ./mlnxofedinstall --upstream-libs --dpdk
    
  4. Reload the driver (this can freeze your SSH connection for a while).
    $ sudo /etc/init.d/openibd restart
    
  5. You can verify whether it is installed correctly.
    $ ibv_devinfo
    

If it is installed successfully, you can see that two ports are available on the machine with ibv_devinfo. Note that the port number in ibv_devinfo starts from 1 but in DPDK it starts from 0. In the m510 machine, the first port (port 0 in DPDK) is used for public (inter-cluster) connection, and the second port (port 1 in DPDK) is used for private (intra-cluster) connection. To measure the latency of the two machines, we are going to use the second port (port 1 in DPDK) of the NIC.

Now that you have OFED installed, you are ready to build the DPDK library.

  1. Clone the DPDK repository and checkout to the recent release (DPDK 22.07)
    $ git clone https://github.com/DPDK/dpdk
    $ cd dpdk
    $ git checkout releases
  2. Build the DPDK
    $ meson build -Dexamples=all
    $ cd build
    $ meson configure
    $ ninja
    
  3. Finally, we need to configure huge pages where packets are stored in DPDK.
    $ echo 1024 | sudo tee /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages
    

Congratulations! Now your DPDK library is ready to use. Before going on, you are encouraged to play with DPDK’s sample applications on examples folder of DPDK. Their documentation is here. Their compiled binaries are at build/examples.

Warning: The machine will be completely erased when the experiment ends (~16 hours by default). To keep your progress, consider storing your code on Github (using a private repo) or on your personal machine. CloudLab also supports creating a disk image of your machine for snapshotting. Note that your home directory will not be saved in disk images, but you can place data to be included in the image in /opt.

DPDK ICMP Ping Server

In this assignment, you will build a server that can respond to ICMP echoes using DPDK. Since DPDK directly works with Ethernet frames, you will have to manually parse and modify IP and ICMP headers from echo requests. DPDK uses struct rte_mbuf data structure to store packet buffers. The programming guide explains this data structure here.

If you feel it is hard to start, use DPDK's simple L2 forwarding example (examples/skeleton) as a starting point. You will need to change the port initialization code since you will only use port1 in this experiment (remember port0 is needed for Linux and ssh). You have to modify lcore_main() to incorporate your packet parsing and crafting logic into its run-to-completion loop. You might find the following macros and functions useful for your purpose. You can find them in DPDK’s API documentation.

Hint: It may be easier to modify each received packet buffer in place before sending it, rather than creating a new packet buffer.

Hint: IP and ICMP checksums must be updated if you modify the packet. DPDK provides several functions to help calculate checksums.

Hint: See RFC 792, ICMP echo, for more details about how ping works.

Hint: Make sure your Ethernet header contains the right MAC addresses.

Now let us use Linux's ping tool as the client to test the server you just built. You can generate ping requests on the client machine as follows:

  1. Setup network on client machine:
    $ sudo ifconfig eno1d1 192.168.1.2 netmask 255.255.255.0
  2. Create a static ARP entry for the server:
    $ sudo arp -s 192.168.1.3 [your server's eno1d1 MAC address]
  3. Generate a flood of ping requests:
    $ sudo ping -f 192.168.1.3 -c 500000

The last command takes around 10 seconds to complete. If successful, you will see ping statistics with zero packet loss.

Hint: If your code does not work correctly, tcpdump and dpdk-dumpcap are useful tools for debugging. You can use tcpdump for the client-side diagnosing and/or dpdk-dumpcap for the server-side diagnosing.

DPDK ICMP Ping Client

Now that you have a working DPDK-based ping server, the next step is to replace the Linux's ping client with a DPDK-based ping client. The client mainly does two things: 1) sending the ICMP echo request packet to the server, and 2) receiving the ICMP echo reply packet back from the server. For simplicity, you can hardcode the request packet in the client.

Hint: Run your ping server with the Linux client and print out the received echo request packet.

Performance Measurement

Now that you have a working client and server to perform ICMP echoes, the final step is to measure the time your software and DPDK use to perform a round trip of echo. You have to figure out the correct code position to add timing API calls.

Hint: You might have to instrument part of the DPDK mlx4 driver.

For an accurate time measurement, you can read the CPU's time stamp counter (TSC) for the elapsed cycles and calculate the elapsed time as cycles/freq. Below shows a sample code for accomplishing that in DPDK.

uint64_t hz = rte_get_timer_hz(); 
uint64_t begin = rte_rdtsc_precise(); 
// Do something
uint64_t elapsed_cycles = rte_rdtsc_precise() - begin; 
uint64_t microseconds= elapsed_cycles * 1000000 / hz;

If you want to learn more, this Intel white paper is a good reference. You’re free to use other methods to measure or infer your system’s performance, but make sure to describe your approach.

Hand-in Instructions

  1. Launch an experiment with two m510 machines on Cloudlab. Run your DPDK-based ICMP ping server.
  2. Run Linux’s ping tool as the client and send us its output.
  3. Run your DPDK-based ICMP ping client. What is the latency you observed for a round trip of echo? And how does it compare with the latency reported by Linux's ping tool?
  4. For a round trip of echo, measure the time spent in your software and DPDK at both the server side and the client side. Can you measure or infer the latency of the raw networking hardware?
  5. What is the overhead introduced by your DPDK software stack? Do you think your implementation is optimal? If not, which part can be further improved?
  6. Submit your source code (excluding DPDK library) and answers (of Q2-Q5) to 6828seminar-staff@lists.csail.mit.edu. You can submit multiple times before the deadline; the latest one will be used for grading.