xv6book Notes(C1-4)
一些细节和思考:
Q: wait for reading the source code and thinking
A: after reading the source code and thinking
Chapter 1
1.1
close syscall just "Release the file fd", don't really operate on the file itself.
and correspandingly, the open syscall just ask for a fd and return
fd isn't something new, just the an "alias" or "abstraction" of file store location in real world
Q: Is fd a key-value mapping of inode which the kernel mainteins?
wait syscall's full def
int wait(int* status);
// wait for a child to exit; exit status in *status, returns child PID
exit syscall
int exit(int status);
// terminate the current process; status reported to wait().No return
// return 0 conventionally to indicate success, 1 to indicate failure
Q: Is no return means the end of the process? FurtherMore, after exit syscall, OS is doing what during waiting(or scheding)? release resources, specific?
wait return -1 when the process has no children(when return? killed or exited)
exec syscall
"replacing the calling process's memory with a new memory image loaded file stored in the file sys"
not magic, the file must have some specific format to indicate which part of the file holds instruction, which part of the file holds data... etc
xv6 use ELF format
Link: Chapter 3
the pipe implement in suer shell
assume that we have pipe syscall already
cmd, after parse, has been given a cmd type
case PIPE:
pcmd = (struct pipecmd *)cmd;
// pipecmd is just
/*
struct pipecmd{
int* type; // enum actually
cmd* leftcmd;
cmd* right;
// Q: cmd only has an element "type", why?
}
*/
if (pipe(p) < 0)
panic("pipe");
if (fork1() == 0) {
close(1);
// close the write port
dup(p[1]);
// duplicate the write port of the pipe, now it will occupy the 1
// close the other pipe fd
close(p[0]);
close(p[1]);
runcmd(pcmd->left);
// now run the cmd
}
if (fork1() == 0) {
// same
close(0);
dup(p[0]);
close(p[0]);
close(p[1]);
runcmd(pcmd->right);
}
close(p[0]);
close(p[1]);
wait(0);
wait(0);
break;
Attention: It calls twice fork. The left of pipe and the right of pipe are both the children of shell.
And there' s a recursion here(runcmd):
the cmd struct only needs an element "type" because the cmd runs only when its type is "EXECCMD"
struct execcmd {
int type;
char *argv[MAXARGS];
char *eargv[MAXARGS];
};
The entire shell actual only has such a simple core in main function
// Read and run input commands.
while (getcmd(buf, sizeof(buf)) >= 0) {
// ...
if (fork1() == 0) // fork a child process
runcmd(parsecmd(buf)); // run cmd
wait(0); // wait for the child be killed or exit
}
// never return
So that's a reason why Unix(xv6) separate fork and exec. We need the possibility to execute some code between fork and exec, like implementing pipe(doing some operation on fd)
Also, another reason is to waste the resources: Based on COPY-ON-WRITE
xv6 allocates the memory of user implicitly using fork and exec
user process can also explicitly call sbrkto ask for more memory
1.2
A file descriptor is a small integer representing a kernel-managed object that a process may read/write to
source:
- open: file, dir, device
- create: pipe
- dup
An abstraction of a file, device, pipe, making them all look like streams of bytes
The shell ensures three fd open(which is the shell process's stdin, stdout and stderr) (console)
There's an offset binding with each fd, making it possible to write code like
if(fork()==0){
write(fd, "Hello ", 6);
}
else{
wait(0);
write(fd, "world", 6);
}
fork && dup will copy the fd and the offset
dup is used to implement the shell command like this
[somecommand] 2>&1
which means redirect the err message to the stdout
It can be implemented by
close(2);
fd = dup(1);// so it copies the 1's fd and offset, and occupy the loc of 2(stderr).
1.3
A pipe is a kernel buffer that exposed to process as a pair of file discriptors
If there is no data available, the read side of a pipe will be blocked. A read on a pipe waits:
- write data
- the ref count of the write fd of the pipe is 0
So it's very important to close write fd in the child process
if one of wc's fd refers to the pipe, the wc would never see the end-of-file
验证:
#include <stdio.h>
#include <unistd.h>
int main() {
// wc the std input
int p[2];
char *argv[2];
argv[0] = "wc";
argv[1] = NULL;
pipe(p);
if (fork() == 0) {
// child process
printf("switched to child process\n");
close(0);
dup(p[0]);
close(p[0]);
close(p[1]); // 此处若注释......
execvp(argv[0], argv);
} else {
printf("switched to parent process\n");
close(p[0]);
write(p[1], "hello world\n", 11);
close(p[1]);
}
return 0;
}
结果是
❯ gcc pipe.c
❯ ./a.out
switched to parent process
switched to child process
0 2 11
如果将close(p[1])注释掉,不会输出 0 2 11, 因为 wc 根本没看到 end-of-file(管道写端引用计数不为 0),确 实如此
这里其实有一些有趣的事情:
- 代码之中没有使用
wait,因而不保证父子进程的先后顺序;但是,如果先切换到子进程,会因为管道内部没有可读的数据而堵塞,再次发生进程调度。 - 可能还有疑问,如果父进程在 write 中间发生进程切换呢?实际上是不会发生的,也就是说
write是 atomic operation, 在 Unix 和 Linux 系统中,对于管道和普通文件的 write 操作,如果写入的数据量小于 PIPE_BUF(通常是 4096 字节),那么 write 操作是原子的
read the code of sh.c 100
Thus, the shell may create the tree of process; The leaves of tree are command and the interior nodes are processes that wait the left and the right children complete.
书下面几段对这种 fork 两次,左右中的执行顺序的 design 也做了很有意思的探讨
pipe's advtanges over temp file(why echo hi | wc is better than echo hi > tmp; wc < tmp)
- pipes clean themselves up
- pipes can pass arbitrarily long streams of data, while tmp file need enough space in disk and may be slower
- pipes allows paralel execution(recall the pipe tree)
- pipe's blocking reads
1.4
file system: a tree starting at root
tree node: data file: uninterpreted byte arrays
dir: a named ref of data files and other dir
chdir and open syscall
create new files or dirs?
open("path", O_CREATE)mkdir("path")mknod("path", major-dev-number, minor-dev-number)
mknod 这个 syscall 比较特殊,可以用来创建命名管道 FIFO 或者设备文件
命名管道就是带名字的管道
设备文件分成两种
在 Unix 和 Linux 系统中,设备被表示为设备文件,这是一种特殊的文件类型。设备文件分为两种:字符设备文件和块设备文件。
- 字符设备文件:提供不带缓冲的、串行的数据访问,应用程序读写的是设备提供的原始数据。键盘和鼠标等输入设备通常是字符设备。
- 块设备文件:提供带缓冲的、随机访问的数据访问,数据以块为单位进行读写。硬盘和光驱等存储设备通常是块设备。
设备文件与普通文件的主要区别在于,设备文件提供了一种访问硬件设备的接口。当你对设备文件进行读写操作时,实际上是在与相应的设备进行通信。
主设备号和次设备号是设备文件的重要属性,它们用于标识系统中的设备。
- 主设备号:用于标识设备的类型或设备驱动。例如,所有的 SCSI 硬盘都有相同的主设备号。
- 次设备号:用于标识同一类型的设备中的具体设备。例如,系统中的第一个 SCSI 硬盘的次设备号是 0,第二个 SCSI 硬盘的次设备号是 1,以此类推。
设备号是设备驱动程序使用的,应用程序通常不需要直接处理设备号。当你打开一个设备文件时,系统会根据设备文件的设备号找到相应的设备驱动,然后由设备驱动处理你的读写请求。(也就是说内核维护了一个设备号->驱动的 jump table)
文件和文件名在文件系统之中是两个东西:
- 文件 file: 是存储在磁盘上的 metadata,或者叫 inode
- 文件名:是对文件的一个 link 或者 ref, 文件名是由文件夹 dir 来维护的一个 dir entry
struct dirent {
ushort inum;
char name[DIRSIZ];
};
#define T_DIR 1 // Directory
#define T_FILE 2 // File
#define T_DEVICE 3 // Device
struct stat {
int dev; // File system's disk device
uint ino; // Inode number
short type; // Type of file
short nlink; // Number of links to file
uint64 size; // Size of file in bytes
};
file 是由一些类型信息、权限信息、引用计数等信息+data 组成的(其中的 data 就是 inode)
struct file {
enum { FD_NONE, FD_PIPE, FD_INODE, FD_DEVICE } type;
int ref; // reference count
char readable;
char writable;
struct pipe *pipe; // FD_PIPE
struct inode *ip; // FD_INODE and FD_DEVICE
uint off; // FD_INODE
short major; // FD_DEVICE
};
// in-memory copy of an inode
struct inode {
uint dev; // Device number
uint inum; // Inode number
int ref; // Reference count
struct sleeplock lock; // protects everything below here
int valid; // inode has been read from disk?
short type; // copy of disk inode
short major;
short minor;
short nlink;
uint size;
uint addrs[NDIRECT+1];
};
link : 给指定的 inode 加上另一个名字
open("a", O_CREATE);
link("a", "b"); // 现在名字“b”也指向“a”对应的inode了
unlink: 删除一个名字
一个有趣的应用是创建临时文件
open("temp", O_CREATE|O_RDWR);
unlink("temp");
此时只有 open 返回的或者其它地方的拿到的 fd 还能访问临时文件
当这个进程退出,回收 fd 的时候,temp 这个文件的引用计数为 0,会被清除
一个小细节,shell 大部分的命令都是放在其他文件之中调用来实现的,即 fork+exec,但 cd 是在 shell 里面的
int main(void) {
static char buf[100];
int fd;
// Ensure that three file descriptors are open.
while ((fd = open("console", O_RDWR)) >= 0) {
if (fd >= 3) {
close(fd);
break;
}
}
// Read and run input commands.
while (getcmd(buf, sizeof(buf)) >= 0) {
if (buf[0] == 'c' && buf[1] == 'd' && buf[2] == ' ') {
// Chdir must be called by the parent, not the child.
buf[strlen(buf) - 1] = 0; // chop \n
if (chdir(buf + 3) < 0)
fprintf(2, "cannot cd %s\n", buf + 3);
continue;
}
if (fork1() == 0)
runcmd(parsecmd(buf));
wait(0);
}
exit(0);
}
因为 cd 的时候是 shell 进程 cd 而不是子进程 cd
1.5
Unix syscall interface standard: POSIX(Portable Operating System Interface)
lab01 Utils
boot xv6:照做即可,或参照踩坑记录
sleep: 知道 user.h 是 syscall 定义的地方即可,可以了解一下 syscall 是啥意思
pingpong: 注意 child process 和 parent process 的收发顺序,不要让两个管道都 block 了,除此之外,注意打印的 fd 是什么
参考代码实现
#include "kernel/stat.h"
#include "kernel/types.h"
#include "user/user.h"
int main() {
// ping-pong: two thread byte communication
// note the pipe is unidirectional
int p2c_pipe[2];
int c2p_pipe[2];
pipe(p2c_pipe);
pipe(c2p_pipe);
if ((fork()) == 0) {
// child process
close(0);
dup(p2c_pipe[0]);
close(p2c_pipe[0]);
close(p2c_pipe[1]);
const char *send = "pong\n";
char *received = malloc(5);
int pid = getpid();
read(0, received, 5);
fprintf(1, "%d: received %s", pid, received);
write(c2p_pipe[1], send, sizeof(send)); // write back
free(received);
} else {
close(0);
dup(c2p_pipe[0]);
close(c2p_pipe[0]);
close(c2p_pipe[1]);
// keep the STDOUT's fd: 1
const char *send = "ping\n";
write(p2c_pipe[1], send, sizeof(send));
char *received = malloc(5);
int pid = getpid();
read(0, received, 5);
fprintf(1, "%d: received %s", pid, received);
free(received);
}
exit(0);
}
primes: 这个 lab 有一些难度
在第一次做的时候其实没搞懂
首先要理解题目意思:背景材料给出了这样的素数并行筛法
p = get a number from left neighbor
print p
loop:
n = get a number from left neighbor
if (p does not divide n)
send n to right neighbor
其中每一次筛之间都是管道,而每一轮筛本身是一个进程
也就是说我们要在某个循环之中,完成 "从上一个管道的输入读取上一次筛剩下的数" 到 "采用读取到的第一个数作为新的筛子,并将它打印" 到 “新建管道,向管道里面写入数据”
而最关键的其实在于:
- 我们怎么样去指示读到的数是不是第一个,怎样判定前面的数已经读完和整个方法已经结束
- 我们怎么处理新建管道和新建子进程,包括它们之间的先后顺序
- 每个子进程都有自己的堆和栈,所以上述的标识符应该要么是本地量,要么是管道这种全局维护的
还有就是注意提示,例如:
- 提示里面说了
Be careful to close file descriptors that a process doesn't need, because otherwise your program will run xv6 out of resources before the first process reaches 35.这实际上说明了当你的输出里面$符号夹在输出中间的时候,是因为你的祖先进程没有等到所有的子进程都结束就退出了(所以 xv6 shell 认为要执行下一次命令输入,输出了$) - Hint:
readreturns zero when the write-side of a pipe is closed. 这一条实际上提示了循环的写法
下面给出个人实现:
#include "kernel/stat.h"
#include "kernel/types.h"
#include "user/user.h"
int main() {
int curNumber = -1;
int p[2];
pipe(p);
int cnt = 0;
int div = 2;
if (fork() == 0) {
close(0);
dup(p[0]);
close(p[0]);
close(p[1]);
int newpipe[2];
int ret = 0;
int flag = 0; // if there are primes in the pipe.
// now pipe input is fd: 0
while (1) {
ret = read(0, &curNumber, sizeof(int));
// fprintf(1, "cnt=%d, curNumber=%d\n", cnt, curNumber);
// all commented fprintf are some log info
cnt++;
if (cnt == 1) {
div = curNumber; // the prime to div
pipe(newpipe); // new pipe
}
if (ret == 0) {
// the parent process has written all the numbers and exit
if (flag == 0) {
// no prime left
// fprintf(1, "over\n");
close(p[0]);
close(p[1]);
close(newpipe[0]);
close(newpipe[1]);
break;
} else if (fork() == 0) {
// the new child process(next stage)
// reset cnt and flag
cnt = 0;
flag = 0;
close(0);
dup(newpipe[0]);
close(newpipe[0]);
close(newpipe[1]);
} else {
close(0); // release the pipe and process
close(1);
close(p[0]);
close(p[1]);
wait(0); // wait for the latest process to exit, a wait "stack"
exit(0);
}
} else {
if (cnt == 1) {
// fprintf(1, "process: %d\n", getpid());
fprintf(1, "prime %d\n", curNumber);
}
if (curNumber % div != 0) {
write(newpipe[1], &curNumber, sizeof(int));
flag = 1;
}
}
}
}
close(p[0]);
for (int i = 2; i <= 35; i++) {
write(p[1], &i, sizeof(int));
}
close(p[1]);
wait(0);
exit(0);
}
实际上是,先创建 pipe,然后把数据放在 pipe 里面,等到 read 返回一个 0(父进程写完,管道读完,管道写端已经没有引用)的时候,去 fork 一个进程,然后“承接”这个管道的数据
find:
根据它的 hint 去看 ls.c,依葫芦画瓢即可
xargs: