Week 1: Working with workflow

Mar 10, 2020

Welcome back, everyone!

I have just finished my first week at Unanet. Starting on my first day, my onsite advisor gave me an assignment that replicates my final product at a much simpler level. This base-level deliverable would: (1) Run a batch file that creates a text file within the same folder, (2) run another batch file that does the same thing right when step 1 finishes, and (3) create a text file from the program itself.

For those that don’t know what a batch file is (don’t worry, I didn’t either), it is a list of commands within a file that, when run, can autonomously carry out functions like creating a text file with some words on it. These two steps are not very hard to do by themselves on Python; there are functions that can do both on one line. The goal, however, was to carry them out within Luigi, the workflow library that I chose to use.

 

Learnings

Luigi (not Mario’s brother, sorry) can be installed onto python in order to run through a workflow like the one outlined above. Tasks make up the structure of the workflow, and these tasks are split into three sections: ‘required(self),’ ‘run(self),’ and ‘output(self).’ The required section describes what needs to happen before the task is completed (another task, for example). The run section is the main substance and code of the task (where I would tell it to run the batch files). The output section outlines what needs to be present in order for the task to be considered ‘done’ (I would write that the new files must be present).

Windows command prompt runs these Luigi programs. To run, you must specify one specific task to run (usually the last one that requires other tasks to start). One concept that took me a while to figure out is that when Luigi goes through the script you write, it looks at the output section first, not last. If the specified output of a task is already present, it doesn’t do anything else and moves on. So if you depend on Task A to run Tasks B and C, but the output of Task A already exists, B and C will not run.

After figuring all of this out, I wrote a Luigi script that worked:

 

Experience

I would venture to say that this week was even harder than saying the name of this blog post 5 times fast. Especially on the first day, I felt a bit over my head, finding more and more problems the deeper I dug into the assignment. Very gradually, I was able to gain a general understanding of Luigi, windows command prompt, and Python in general. If you look back to my proposal, abstract, and introductory blog, you will see that the ‘week 1’ that I envisioned was not very similar to the one I have just described. This is not a bad thing; it was great to have bursts of satisfaction when I finally figured out a concept that I had been looking into for a long time. In the following weeks, I think I will start trying to incorporate XML files into Luigi so that they can outline the tasks and workflow that I would like to carry out.

Thanks for reading… stay tuned!

 

 

One Reply to “Week 1: Working with workflow”

  1. Tad B. says:

    “Luigi (not Mario’s brother, sorry)” :). I like your writing style in this blog, and the content is equally robust. Looking forward to hearing more in the coming weeks!

Leave a Reply

X