Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get the last modification time of each files present in azure datalake storage using python in databricks workspace?

I am trying to get the last modification time of each file present in azure data lake.

files = dbutils.fs.ls('/mnt/blob')

for fi in files: print(fi)

Output:-FileInfo(path='dbfs:/mnt/blob/rule_sheet_recon.xlsx', name='rule_sheet_recon.xlsx', size=10843)

Here i am unable to get the last modification time of the files. Is there any way to get that property.

I tries this below shell command to see the properties,but unable to store it in python object.

%sh ls -ls /dbfs/mnt/blob/

output:- total 0

0 -rw-r--r-- 1 root root 13577 Sep 20 10:50 a.txt

0 -rw-r--r-- 1 root root 10843 Sep 20 10:50 b.txt

like image 433
Satyaranjan Behera Avatar asked Nov 15 '22 23:11

Satyaranjan Behera


1 Answers

We can use os package to get the information. For example in pyspark

import os

def get_filemtime(filename):
  return os.path.getmtime(filename)

You can pass the absolute path of the filename like dbfs:/mnt/adls/logs/ehub/app/0/2021/07/21/15/05/40.avro

like image 185
Naveen Anto Avatar answered Dec 20 '22 16:12

Naveen Anto