如何列出的所有文件和目录在另一个数据湖2储存帐户，这是在不同的订阅

Question 1

我想得到的所有文件和他们的子目录的从容器在蔚蓝的储存帐户中的一个不同的订阅及业务要求是使用abfss网址。 abfss://@.dfs.core.windows.net//. 我试图进口火花配置，用于订阅和使用，以下代码返回的文件清单。但失败了。

import os
from fnmatch import fnmatch
root_list="abfss://[email protected]/staging/"
files_list = []
pattern = "*.*"
print(pattern)
for path, subdirs, files in os.walk(root_list):
  for name in files:
    if fnmatch(name.upper(), pattern.upper()):
      files_list.append(path+"/"+name)

这个打印"[]"空的名单。

Question 2

你可以用下面的代码，这个使用情况。

from pyspark.sql.functions import col
from azure.storage.blob import BlockBlobService
from datetime import datetime
import os.path

account_name='accountname'
container_name ='container_name'
second_conatainer_name ='data'
account_key = 'storage-account-key'
prefix_val = second_conatainer_name+'/'

block_blob_service = BlockBlobService(account_name='%s'%(account_name), account_key='%s'%(account_key))

#block_blob_service.create_container(container_name)
generator = block_blob_service.list_blobs(container_name,prefix="%s"%(prefix_val))
report_time = datetime.now().strftime('%Y-%m-%d %H:%M:%S')


Target_file = "/target2/%s.csv" % (container_name)
print(Target_file)

Target_file = open("%s"%(Target_file), 'w')

for blob in generator:
    length = BlockBlobService.get_blob_properties(block_blob_service,container_name,blob.name)
    last_modified = BlockBlobService.get_blob_properties(block_blob_service,container_name,blob.name).properties.last_modified
    file_size = BlockBlobService.get_blob_properties(block_blob_service,container_name,blob.name).properties.content_length
    blob_type = BlockBlobService.get_blob_properties(block_blob_service,container_name,blob.name).properties.blob_type
    creation_time = BlockBlobService.get_blob_properties(block_blob_service,container_name,blob.name).properties.blob_tier_change_time
    if file_size != 0:
       line = account_name+'|'+container_name+'|'+blob.name+'|'+ str(file_size) +'|'+str(last_modified)[:10]+'|'
       print(line)
       Target_file.write(line+'\n')

Karthikeyan Rasipalay Durairaj · Answer 1 · 2021-11-23T15:09:53

你可以用下面的代码，这个使用情况。

from pyspark.sql.functions import col
from azure.storage.blob import BlockBlobService
from datetime import datetime
import os.path

account_name='accountname'
container_name ='container_name'
second_conatainer_name ='data'
account_key = 'storage-account-key'
prefix_val = second_conatainer_name+'/'

block_blob_service = BlockBlobService(account_name='%s'%(account_name), account_key='%s'%(account_key))

#block_blob_service.create_container(container_name)
generator = block_blob_service.list_blobs(container_name,prefix="%s"%(prefix_val))
report_time = datetime.now().strftime('%Y-%m-%d %H:%M:%S')


Target_file = "/target2/%s.csv" % (container_name)
print(Target_file)

Target_file = open("%s"%(Target_file), 'w')

for blob in generator:
    length = BlockBlobService.get_blob_properties(block_blob_service,container_name,blob.name)
    last_modified = BlockBlobService.get_blob_properties(block_blob_service,container_name,blob.name).properties.last_modified
    file_size = BlockBlobService.get_blob_properties(block_blob_service,container_name,blob.name).properties.content_length
    blob_type = BlockBlobService.get_blob_properties(block_blob_service,container_name,blob.name).properties.blob_type
    creation_time = BlockBlobService.get_blob_properties(block_blob_service,container_name,blob.name).properties.blob_tier_change_time
    if file_size != 0:
       line = account_name+'|'+container_name+'|'+blob.name+'|'+ str(file_size) +'|'+str(last_modified)[:10]+'|'
       print(line)
       Target_file.write(line+'\n')

如何列出的所有文件和目录在另一个数据湖2储存帐户，这是在不同的订阅

的问题

最好的答案

其他语言

此页面有其他语言版本

受欢迎的此类别

流行的问题，在这个类别