I get questions like this a lot:

  • Where did this data come from?
  • How do I know I can trust the source?
  • What types of QA checks were applied to this data?

Data lineage is such a chronic issue in data engineering. This blog post from Airbyte gives a good overview & mentions some interesting products/projects that can maybe help out with data lineage.

Unfortunately, I have limited flexibility to purchase or install tools for this in my current role. Anyone rolled their own solution for this?

1 point

Apache Nifi maintains a linage table for its data movement and transformation

permalink
report
reply

Data Engineering

!dataengineering@lemm.ee

Create post

Discussion on Data Engineering topics. Data pipelines, tools and technologies, databases and DBMS, best practices:

Rules:

  • Limited to data engineering, no general CS/programming posts.
  • No technical questions. Example: how to fix this bug in my code.
  • No marketing
  • No resumes, jobs
  • No PII

Community stats

  • 1

    Monthly active users

  • 20

    Posts

  • 17

    Comments

Community moderators