I hope that all is well with you. This week, I continued to pitch in on auditing delivery times to improve projections and made progress on my project to create a comprehensive list of customer profiles. Doing so has consisted of lots of trial and error but has been a great learning experience.
To provide context, aggregating data from multiple data sets into a single comprehensive list is accomplished through querying, which involves writing code in Structured Query Language (SQL) to select particular columns from specified tables that satisfy a set of conditions. I did this in the program Domo, which has SQL integrated into it and will let me use the final data set to design visualizations.
Before running any code, Domo has a nice feature that makes it possible to test SQL code on the first 100 rows of data in the data set. Under normal circumstances, this conveniently validates that the code works and does what it is expected to. I tested my code using this feature, and it compiled and did what I hoped it would. However, as I would soon discover, this feature has an inconvenient but very understandable caveat: Domo does not expect a data set to have more than two million rows of data, and, if the data flow is so large that it cannot be completed in 24 hours, Domo cuts it off and returns a 0% populated output dataset—after one has waited 24 hours!
Realizing this limitation, I decided to take some time to think methodically about how to get around this issue to craft a targeted approach and a plan. After some thought, I elected to use a tried-and-true strategy: break one prohibitively large problem into a series of manageable smaller ones. To this end, I decided to split the one large dataflow into four smaller ones.
This turned out to be an interesting strategy. The contents of the large initial data set were divided into each of the four dataflows based on whether the phone number and email fields were populated in the existing customer profiles. While all of the dataflows involved significantly less data than the large initial data set, the four dataflows all had differing amounts of records. Let’s do the play-by-play….
Dataflow One and Dataflow Two turned in an impressive performance, finishing quite quickly with Dataflow One just barely edging out Dataflow Two in the home stretch during the final hour. However, Dataflow Three and Dataflow Four, which contained the most rows, had other plans. Dataflow Three delivered a spectacular surprise when I woke up to its unexpected completion, clocking in at just over 15 hours, 41 minutes, and an even ten seconds. Dataflow Four kept all the fans (all one of them) on their toes, just barely finishing before the 24-hour termination mark by clocking in at 23 hours, 32 minutes, and six seconds—an exhilarating buzzer beater to say the least!
In the end, my biggest takeaways were not about the success or failure of the dataflows themselves but rather key lessons about professional workplace life. As I pushed through lots of trial and error before finally finding a working strategy, I learned how to achieve a balance between asking for help and working through problems on my own. I respected the time of my boss and teammates while also asking for help when it was necessary to maximize efficiency. I also continued to hone my time management skills: when data flows take almost 24 hours (and, even so, often fail!), it is important to run things overnight and to think ahead with a plan of action. I am confident that these lessons will serve me well in the future as I dig deeper into professional life.
I have enjoyed reading your questions and comments and am happy to answer any questions that you may have this week. Just let me know!
P.S. This week’s Stat of the Week is about internet changes due to COVID-19. According to Forbes, internet usage has significantly increased—by between 50% and 70%, depending on the estimate used and location of interest. In many ways, this is to be expected with increased teleworking, streaming, and online purchasing, but the magnitude of the increase is still quite amazing. Neat stuff all around!