Tuesday, 16 December 2014

Listing the directory structure into CSV format | Python

In order to avoid complicated population of databases, by filename and containing folders, we can in turn switch to saving data in csv file formats.

The problem encountered was:

I had to create a database having 5 fields:

'object', 'category' , 'filename', 'source', 'date'

source was a fixed string and date can easily be found in python by importing the time lib as shown in code below.

Directories were in this order:

category<object<filename

Elaborating, in my root folder , I had several category folders, like Animals, Musical Instruments, Vehicles etc and inside them I had further sub-folders like in Animals, I had dog, cat etc and so on for other folders. Further in this subfolder I had .mp3 and .wav files . I need to have this directory structure presentable in csv format as discussed above.

Doing this manually for above 2000 files is a herculion job, here is the script to simplify it:


import os
import csv
import time

def list_files(dirpath):
    files = []
    for dirname, dirnames, filenames in os.walk(dirpath):
        files += [os.path.join(dirname, filename) for filename in filenames]
    return files

now = time.strftime("%c")
path = '/home/infinite/findsounds'

files=list_files(path)

with open('/tmp/final.csv', 'wb') as csvfile:
  writer = csv.writer(csvfile)
  writer.writerow(['object', 'category' , 'filename', 'source', 'date'])
  for file in files:
    if file:
     
      filestr=str(file)
     
      directories = filestr.split("/")
      writer.writerow([directories[5],directories[4],directories[6],'www.findsounds.com', now])

Please take care of the indentation while copying.

A view of CSV generated (Libre Office, comma separated view):


Small piece of code, hope it helps.

PS: Feel free to drop your suggestions/queries below.

Keep Scripting!

No comments:

Post a Comment

HIGH FIVE ~ Programming tips [C#]

Hello World! Over my tenure as an engineer I got a lot of insights into the C# language (which BTW is Microsoft's own language :D ...