i'm trying to navigate the page tables for a process in linux. In a kernel module i realized the following function:
static struct page *walk_page_table(unsigned long addr)
{
pgd_t *pgd;
pte_t *ptep, pte;
pud_t *pud;
pmd_t *pmd;
struct page *page = NULL;
struct mm_struct *mm = current->mm;
pgd = pgd_offset(mm, addr);
if (pgd_none(*pgd) || pgd_bad(*pgd))
goto out;
printk(KERN_NOTICE "Valid pgd");
pud = pud_offset(pgd, addr);
if (pud_none(*pud) || pud_bad(*pud))
goto out;
printk(KERN_NOTICE "Valid pud");
pmd = pmd_offset(pud, addr);
if (pmd_none(*pmd) || pmd_bad(*pmd))
goto out;
printk(KERN_NOTICE "Valid pmd");
ptep = pte_offset_map(pmd, addr);
if (!ptep)
goto out;
pte = *ptep;
page = pte_page(pte);
if (page)
printk(KERN_INFO "page frame struct is @ %p", page);
out:
return page;
}
This function is called from the ioctl
and addr
is a virtual address in process address space:
static int my_ioctl(struct inode *inode, struct file *filp, unsigned int cmd, unsigned long addr)
{
struct page *page = walk_page_table(addr);
...
return 0;
}
The strange thing is that calling ioctl
in a user space process, this segfaults...but it seems that the way i'm looking for the page table entry is correct because with dmesg
i obtain for example for each ioctl
call:
[ 1721.437104] Valid pgd
[ 1721.437108] Valid pud
[ 1721.437108] Valid pmd
[ 1721.437110] page frame struct is @ c17d9b80
So why the process can't complete correcly the `ioctl' call? Maybe i have to lock something before navigating the page tables?
I'm working with kernel 2.6.35-22 and three levels page tables.
Thank you all!
Each process has its own page table in the kernel. Having a separate page table for each process is necessary for process isolation as they should not be allowed to stomp on each others memory. Since each process has a different page table, there is not one pmap that will work for every process.
A page table is the data structure used by a virtual memory system in a computer operating system to store the mapping between virtual addresses and physical addresses.
The page table of the process is held in the kernel space. The kernel may have several page tables in RAM, but only one is the active page table. In x86 CPUs, it's the page table pointed by register CR3.
The kernel page table is a direct mapping to physical addresses, so that kernel virtual address x maps to physical address x. Xv6 also has a separate page table for each process's user address space, containing only mappings for that process's user memory, starting at virtual address zero.
pte_unmap(ptep);
is missing just before the label out. Try to change the code in this way:
...
page = pte_page(pte);
if (page)
printk(KERN_INFO "page frame struct is @ %p", page);
pte_unmap(ptep);
out:
Look at /proc/<pid>/smaps
filesystem, you can see the userspace memory:
cat smaps
bfa60000-bfa81000 rw-p 00000000 00:00 0 [stack]
Size: 136 kB
Rss: 44 kB
and how it is printed is via fs/proc/task_mmu.c
(from kernel source):
http://lxr.linux.no/linux+v3.0.4/fs/proc/task_mmu.c
if (vma->vm_mm && !is_vm_hugetlb_page(vma))
walk_page_range(vma->vm_start, vma->vm_end, &smaps_walk);
show_map_vma(m, vma.....);
seq_printf(m,
"Size: %8lu kB\n"
"Rss: %8lu kB\n"
"Pss: %8lu kB\n"
And your function is somewhat like that of walk_page_range(). Looking into walk_page_range() you can see that the smaps_walk structure is not supposed to change while it is walking:
http://lxr.linux.no/linux+v3.0.4/mm/pagewalk.c#L153
For eg:
}
201 if (walk->pgd_entry)
202 err = walk->pgd_entry(pgd, addr, next, walk);
203 if (!err &&
204 (walk->pud_entry || walk->pmd_entry || walk->pte_entry
If memory contents were to change, then all the above checking may get inconsistent.
All these just mean that you have to lock the mmap_sem when walking the page table:
if (!down_read_trylock(&mm->mmap_sem)) {
/*
* Activate page so shrink_inactive_list is unlikely to unmap
* its ptes while lock is dropped, so swapoff can make progress.
*/
activate_page(page);
unlock_page(page);
down_read(&mm->mmap_sem);
lock_page(page);
}
and then followed by unlocking:
up_read(&mm->mmap_sem);
And of course, when you issue printk() of the pagetable inside your kernel module, the kernel module is running in the process context of your insmod process (just printk the "comm" and you can see "insmod") meaning the mmap_sem is lock, it also mean the process is NOT running, and thus there is no console output till the process is completed (all printk() output goes to memory only).
Sounds logical?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With