Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Walking page tables of a process in Linux

Tags:

i'm trying to navigate the page tables for a process in linux. In a kernel module i realized the following function:

static struct page *walk_page_table(unsigned long addr)
{
    pgd_t *pgd;
    pte_t *ptep, pte;
    pud_t *pud;
    pmd_t *pmd;

    struct page *page = NULL;
    struct mm_struct *mm = current->mm;

    pgd = pgd_offset(mm, addr);
    if (pgd_none(*pgd) || pgd_bad(*pgd))
        goto out;
    printk(KERN_NOTICE "Valid pgd");

    pud = pud_offset(pgd, addr);
    if (pud_none(*pud) || pud_bad(*pud))
        goto out;
    printk(KERN_NOTICE "Valid pud");

    pmd = pmd_offset(pud, addr);
    if (pmd_none(*pmd) || pmd_bad(*pmd))
        goto out;
    printk(KERN_NOTICE "Valid pmd");

    ptep = pte_offset_map(pmd, addr);
    if (!ptep)
        goto out;
    pte = *ptep;

    page = pte_page(pte);
    if (page)
        printk(KERN_INFO "page frame struct is @ %p", page);

 out:
    return page;
}

This function is called from the ioctl and addr is a virtual address in process address space:

static int my_ioctl(struct inode *inode, struct file *filp, unsigned int cmd, unsigned long addr)
{
   struct page *page = walk_page_table(addr);
   ...
   return 0;
}

The strange thing is that calling ioctl in a user space process, this segfaults...but it seems that the way i'm looking for the page table entry is correct because with dmesg i obtain for example for each ioctl call:

[ 1721.437104] Valid pgd
[ 1721.437108] Valid pud
[ 1721.437108] Valid pmd
[ 1721.437110] page frame struct is @ c17d9b80

So why the process can't complete correcly the `ioctl' call? Maybe i have to lock something before navigating the page tables?

I'm working with kernel 2.6.35-22 and three levels page tables.

Thank you all!

like image 441
MirkoBanchi Avatar asked Jan 23 '12 23:01

MirkoBanchi


People also ask

Is there a page table per process?

Each process has its own page table in the kernel. Having a separate page table for each process is necessary for process isolation as they should not be allowed to stomp on each others memory. Since each process has a different page table, there is not one pmap that will work for every process.

What is page table in Linux?

A page table is the data structure used by a virtual memory system in a computer operating system to store the mapping between virtual addresses and physical addresses.

Where are process page tables stored?

The page table of the process is held in the kernel space. The kernel may have several page tables in RAM, but only one is the active page table. In x86 CPUs, it's the page table pointed by register CR3.

Does the kernel have a page table?

The kernel page table is a direct mapping to physical addresses, so that kernel virtual address x maps to physical address x. Xv6 also has a separate page table for each process's user address space, containing only mappings for that process's user memory, starting at virtual address zero.


2 Answers

pte_unmap(ptep); 

is missing just before the label out. Try to change the code in this way:

    ...
    page = pte_page(pte);
    if (page)
        printk(KERN_INFO "page frame struct is @ %p", page);

    pte_unmap(ptep); 

out:
like image 53
Giovanni Cabiddu Avatar answered Sep 20 '22 17:09

Giovanni Cabiddu


Look at /proc/<pid>/smaps filesystem, you can see the userspace memory:

cat smaps 
bfa60000-bfa81000 rw-p 00000000 00:00 0          [stack]
Size:                136 kB
Rss:                  44 kB

and how it is printed is via fs/proc/task_mmu.c (from kernel source):

http://lxr.linux.no/linux+v3.0.4/fs/proc/task_mmu.c

   if (vma->vm_mm && !is_vm_hugetlb_page(vma))
               walk_page_range(vma->vm_start, vma->vm_end, &smaps_walk);
               show_map_vma(m, vma.....);
        seq_printf(m,
                   "Size:           %8lu kB\n"
                   "Rss:            %8lu kB\n"
                   "Pss:            %8lu kB\n"

And your function is somewhat like that of walk_page_range(). Looking into walk_page_range() you can see that the smaps_walk structure is not supposed to change while it is walking:

http://lxr.linux.no/linux+v3.0.4/mm/pagewalk.c#L153

For eg:

                }
 201                if (walk->pgd_entry)
 202                        err = walk->pgd_entry(pgd, addr, next, walk);
 203                if (!err &&
 204                    (walk->pud_entry || walk->pmd_entry || walk->pte_entry

If memory contents were to change, then all the above checking may get inconsistent.

All these just mean that you have to lock the mmap_sem when walking the page table:

   if (!down_read_trylock(&mm->mmap_sem)) {
            /*
             * Activate page so shrink_inactive_list is unlikely to unmap
             * its ptes while lock is dropped, so swapoff can make progress.
             */
            activate_page(page);
            unlock_page(page);
            down_read(&mm->mmap_sem);
            lock_page(page);
    }

and then followed by unlocking:

up_read(&mm->mmap_sem);

And of course, when you issue printk() of the pagetable inside your kernel module, the kernel module is running in the process context of your insmod process (just printk the "comm" and you can see "insmod") meaning the mmap_sem is lock, it also mean the process is NOT running, and thus there is no console output till the process is completed (all printk() output goes to memory only).

Sounds logical?

like image 35
Peter Teoh Avatar answered Sep 20 '22 17:09

Peter Teoh