Concurrency and distributed systems

Formal methods

Programming languages

File systems / Storage

I/O stack

Question: when issuing a file operation, how could it know if the data is already in page cache? Is it searching buffer headers?

BIO layer

The mapping between page cache and block devices are blocks. The size of blocks varies between the size of sectors (512B) and the size of pages (4KB). To build the such a mapping, buffer head is proposed, which records the information of each block, including which device and block it persists to and which page it is cached.

BIO -> IO scheduler -> combined request -> request queue (sequentially queued requests)

When flushing page cache to block devices or reading block devices to page cache, a BIO struct is generated, which builds the connection between a set of non-continous pages and a set of continous sectors in the block devices.

struct bio {
    ...
    struct bvec_iter    bi_iter;
    unsigned short      bi_vcnt;
    struct bio_vec      *bi_io_vec;
    ...    
}

struct bio_vec {
    struct page     *bv_page;
    unsigned int    bv_len;
    unsigned int    bv_offset;
};

struct bvec_iter {
    sector_t        bi_sector;  /* device address in 512 byte sectors */
    unsigned int    bi_size;    /* residual I/O count */
    unsigned int    bi_idx;     /* current index into bvl_vec */
    unsigned int    bi_bvec_done;   /* number of bytes completed in current bvec */
}

Storage organization types

- Block storage : Store data in fixed-length blocks without any other informations.
- Object storage : Metadata and data of objects are organized in a flat view. It's more suitable for scenarios that there are no hierarchy relationship among data.
- File storage : Files, as units, are organized in a hierarchy.

Storage device types

| Hierarchy         | Storage Technique                  | Bus   | Interface to CPUs    | Speed |
| ----------------- | ---------------------------------- | ----- | -------------------- | ----- |
| Register          | -                                  |       | -                    |       |
| Cache             | SRAM                               |       | -                    |       |
| Main Memory       | DRAM                               | DIMM  | DDR3/4/5             |       |
| Persistent Memory | Optane (aka 3D-Xpoint), Flash NAND | DIMM* | DDR3/4/5             |       |
| Flash Disk        | Flash NAND                         | PCIe* | NVMe (based on PCIe) |       |
| Traditional Disk  | Magnetic disk                      | AHCI* | SATA                 |       |

- See [CPU arch](#Architecture/hardware) to know the buses.
- *Dual In-Line Memory Module (DIMM)
- *Peripheral Component Interconnect Express (PCIe)
- *Advanced Host Controller Interface (AHCI)
- Reference
  - [Modern Storage Hierarchy: From NAND SSD to *3D XPoint* (*Optane*) PM](https://www.josehu.com/technical/2021/01/01/ssd-to-optane.html)

Raid, device mapper

http://www.ssdfans.com/?p=8210 https://blog.csdn.net/fengxiaocheng/article/details/103258791 https://www.cnblogs.com/zhangsanlisi411/articles/16751546.html

Linux kernel / OS kernel

Macro/Micro/Exo kernels

Virtualization

How is rdtsc implemented in KVM guest?

Kernel bypass

Architecture/hardware

Basic concepts

IO devices and PCIe

IOMMU and DMA

Data transfer between memory and hardware subsystems execept for CPU takes much CPU cycles, such as data transfer between peripheral devices and memory. Although this problem can be solved with more CPU cores, there should be more efficient solutions, i.e., Direct memory access (DMA). DMA directly transfer data between hardware subsystems and memory without the interference of CPUs. In this case, CPUs are saved for other complex computations while the customized, less complex and thus fast hardware—DMA—takes care of (can be regarded as “accelerators”) transfering data.

// TODO Bus, DMA controller, DMA engine. ref: An overview of direct memory access

// TODO cache coherence of DMA: https://www.quora.com/Can-a-DMA-engine-read-from-the-cache-What-are-the-pros-and-cons-of-this-approach

The impact of DMA On CPU—The memory bus contention between DMA and CPU: There is less memory bus contention between DMA and CPU because:

IOMMU: https://www.kernel.org/doc/Documentation/DMA-API-HOWTO.txt For virtualization and limit the memory access of devices

https://www.spiceworks.com/tech/hardware/articles/direct-memory-access/

https://www.intel.com/content/www/us/en/developer/articles/technical/memory-in-dpdk-part-2-deep-dive-into-iova.html

Pipeline and hyper-threads

Memory (consistency) model

drawing

Cache Indentification

Cache coherence

Keyword Volatile

Volatile is used to tell compiler that do not do any optimization on this variable. The optimizations can be any, for example:

void updateFlag() { flag = 1; // The compiler may keep this in a register }

void checkFlag() { while (flag == 0) { // flag might be optimized out as always 0 // Loop forever because the write to flag may not be visible here } }


- Write Coalescing and Reordering
```c
int status = 0;

void setStatus() {
    status = 1;
    status = 2;  // The compiler may combine these two writes into just `status = 2`
}

void pollStatus() { for (int i = 0; i < 1000; i++) { dataReady = 1; // The compiler might move this out of the loop or eliminate it } } ```

Performance

Database

ACID of transactions

Transactional memory

Transactional memory only gurantees atomicity, isolation, and partial consistency. Partial consistency means that it only ensures the memory level consistency instead of application-level logic consistency (like the credit cannot be minus).

Ways to ensure transactions