Early Data Engineering: Quick Wins, Not Wizardry
If you’re early in the Data Engineering journey and you've already got some executive buy in, chase at least one quick win. Prove value fast, accept some debt, and have a plan to pay it down. Use turnkey tools unless custom buys real advantage. The real bottleneck is time, not hardware. Ship throughput. Think long term, but be ready to defend the time spent with something tangible... or you may not get more time.
Two checklists to pin on the wall:
• Sources: rate of flow, consistency (NaN values or lack of flow?), error/dupe risk, timing (big gaps between pieces of the same pie?), schema (and how it changes), Change Data Capture (CDC) logic, upstream dependencies, quality checks, and whether reads hurt the source's performance.
• Storage: match read/write patterns; don’t do random-writes-on-object-store shenanigans; plan for scale; capture metadata/lineage/schema evolution from day one.
For my current build, QuickBooks is the customer-data source of truth. This week: wrap up Intuit compliance (security is one of the primary undercurrents for data engineering!), get my full de-duped schema set in stone for future use, and hopefully begin the "load" portion of "ETL" (extract, transform, load).
Still on deck from personal study time: Working through the end of my AI/ML bootcamp. I managed to get my prompt injection detection model performing to the standard I wanted (~1/12,000 missed detection rate and similar false positive rate).
Stay smart & stay safe!
-Luke