6.828: Lab 1

Apply for a Cloudlab Account

Apply an account using the signup link. On the “Person Information” panel, click “Join Existing Project” and fill in our project name “MIT6828”. Your application will be approved by us shortly.

Setting up the experiment environment in Cloudlab

We are going to use m510 machines equipped with Eight-core Intel Xeon D-1548 2.0 GHz CPU, 64GB ECC Memory, 256 GB NVMe flash storage, and Dual-port Mellanox ConnectX-3 10 Gbps NIC. You can find the detailed hardware description and how the machines are interconnected here and current availability here. Based on our experience, m510 machines have ample availability most of the time but start the assignment early to avoid missing the deadline due to the availability issue.

  1. Make a profile containing two “raw-pc” (bare metal PC) m510 machines with Ubuntu 18.04 STD (the default image) connected by a link between them, and instantiate an experiment from it. You can use the topology editor, dragging a line between the machines (called nodes in the editor) to create the link. Click on the link and disable "allow interswitch mapping" to force the machines to share the same physical switch.

Mellanox ConnectX-3 requires MLX4 poll mode drive library (librte_pmd_mlx4) to poll the packets directly from the NIC with DPDK. See the detailed document here. To enable the mlx4 driver, you first need to install the Mellanox OFED (OpenFabrics Enterprise Distribution) on the machines.

  1. Install dependencies.
    $ sudo apt-get update
    $ sudo apt-get install libnuma-dev libnl-3-dev libnl-route-3-dev
  2. Download the Mellanox OFED for Ubuntu 18.04 x86_64.
    $ wget http://content.mellanox.com/ofed/MLNX_OFED-4.6-
  3. Install the Mellanox OFED (this will take a full 10 minutes or more).
    $ tar -xvzf MLNX_OFED_LINUX-4.6-
    $ cd MLNX_OFED_LINUX-4.6-
    $ sudo ./mlnxofedinstall --upstream-libs --dpdk
  4. Reload the driver.
    $ sudo /etc/init.d/openibd restart
  5. You can verify whether it is installed correctly.
    $ ibv_devinfo

If it is installed successfully, you can see that two ports are available on the machine with ibv_devinfo. In m510 machine, the first port (port 0 in DPDK) is used for public (inter-cluster) connection, and the second port (port 1 in DPDK) is used for private (intra-cluster) connection. To measure the latency of the two machines, we are going to use ‘port 1’ of the NIC.

Now that you have OFED installed, you are ready to build the DPDK library.

  1. Clone the DPDK repository and checkout to the recent release (DPDK 20.08)
    $ git clone https://github.com/DPDK/dpdk
    $ cd dpdk
    $ git checkout releases
  2. Because we are building the DPDK with MLX4 PMD driver for Mellanox ConnectX-3 NIC, set ‘CONFIG_RTE_LIBRTE_MLX4_PMD=y’ in line 366 of config/common_base
    $ vim config/common_base (Modify line 366)
  3. Build the DPDK
    $ make config T=x86_64-native-linuxapp-gcc
    $ make -j16
  4. Finally, we need to configure huge pages where packets are stored in DPDK.
    $ echo 1024 | sudo tee /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages

Congratulations! Now your DPDK library is ready to use. Before going on, you are encouraged to compile and play with DPDK’s sample applications on examples folder of DPDK. Their documentation is here. When you compile the code, don’t forget to link dpdk/build/lib and include dpdk/build/include.

Warning: Your DPDK configuration (and any code you develop on the Cloudlab machines) will be deleted when the experiment ends (~16 hours by default). Consider storing a copy of your code on Github (using a private repo) or on a personal machine. CloudLab can also make a disk image of your machine (a type of snapshot) to save your progress in installing the packages described above. Note that your home directory will not be saved in disk images, but you can place data to be included in the image in /opt.

DPDK ICMP Ping Server

In this assignment, you will build a server that can respond to ICMP echos using DPDK. Since DPDK directly works with Ethernet frames, you will have to manually parse and modify IP and ICMP headers from echo requests. DPDK uses struct rte_mbuf data structure to store packet buffers. The programming guide explains this data structure here.

If you feel it is hard to start, use the simple L2 forwarding example (examples/skeleton) we discussed in the tutorial as a starting point. You will need to change the port initialization code since you will only use port1 in this experiment (remember port0 is needed for Linux and ssh). You have to modify lcore_main() to incorporate your packet parsing and crafting logic into its run-to-completion loop. You might find the following macros and functions useful for your purpose. You can find them in DPDK’s API documentation.

Hint: It may be easier to modify each received packet buffer in place before sending it, rather than creating a new packet buffer.

Hint: IP and ICMP checksums must be updated if you modify the packet. DPDK provides several functions to help calculate checksums.

Hint: See RFC 792, ICMP echo, for more details about how ping works.

Hint: Make sure your Ethernet header contains the right MAC addresses.

For simplicity, we recommend building an echo server in DPDK, and then using the standard Linux network stack on the other machine to send ICMP echo packets to it (acting as a client), rather than building a separate client in DPDK. You can generate ping packets on the client machine as follows:

  1. Setup network on client machine:
    $ sudo ifconfig eno1d1 netmask
  2. Create a static ARP entry for the server:
    $ sudo arp -s [your server's eno1d1 MAC address]
  3. Generate a flood of ping requests:
    $ sudo ping -f

If successful, your echo server will respond to each ping, and you’ll see ping statistics and no packet loss. You may find that tcpdump is a useful tool for debugging whether the client is receiving your server’s responses.

Performance Measurement

Now that you have a working server that can respond to ICMP ping requests, you are asked to measure the full time your software and DPDK use to process these requests. You have to figure out the correct code position to add timing API calls.

Hint: You might have to instrument part of the DPDK mlx4 driver.

For an accurate time measurement, you can read the CPU's time stamp counter (TSC) for the elapsed cycles and calculate the elapsed time as cycles/freq. Below shows a sample code of accomplishing that in DPDK.

uint64_t hz = rte_get_timer_hz(); 
uint64_t begin = rte_rdtsc_precise(); 
// Do something
uint64_t elapsed_cycles = rte_rdtsc_precise() - begin; 
uint64_t microseconds= elapsed_cycles * 1000000 / hz;

If you want to learn more, this Intel white paper is a good reference. You’re free to use other methods to measure or infer your system’s performance, but make sure to describe your approach.

Hand-in Instructions

  1. Launch an experiment with two m510 machines on Cloudlab. Run your DPDK-based ICMP ping echo program on the server.
  2. Run Linux’s ping command on the client and send us its output.
  3. For a round trip of echo, measure the time spent in your software and DPDK versus the time spent in the rest of the network and the Linux client. How could you measure or infer the latency of the raw networking hardware?
  4. What is the overhead introduced by your DPDK software stack? Do you think your implementation is optimal?
  5. Submit your source code (excluding DPDK library) and answers (of Q2, Q3, and Q4) to 6828seminar-staff@lists.csail.mit.edu. You can submit multiple times before the deadline; the latest one will be used for grading.

Optional Challenge: Try building and using the latest version of Shenango to send a ping to your DPDK server. See tests/test_ping.c. How does its latency compare to Linux?

** Configurations required to build Shenango on m510 in Cloudlab