[{"content":"","date":"10 March 2026","externalUrl":null,"permalink":"/","section":"Albin Cikaj","summary":"","title":"Albin Cikaj","type":"page"},{"content":"Writing about data engineering, systems thinking, and things I\u0026rsquo;m figuring out as I go.\n","date":"10 March 2026","externalUrl":null,"permalink":"/blog/","section":"Blog","summary":"","title":"Blog","type":"blog"},{"content":"","date":"10 March 2026","externalUrl":null,"permalink":"/tags/career/","section":"Tags","summary":"","title":"Career","type":"tags"},{"content":"","date":"10 March 2026","externalUrl":null,"permalink":"/tags/data-engineering/","section":"Tags","summary":"","title":"Data-Engineering","type":"tags"},{"content":"","date":"10 March 2026","externalUrl":null,"permalink":"/tags/engineering/","section":"Tags","summary":"","title":"Engineering","type":"tags"},{"content":"When people hear I studied Mechanical Engineering and now work in data, I usually get one of two reactions: confusion, or \u0026ldquo;oh that makes sense.\u0026rdquo; The second group is right.\nThis isn\u0026rsquo;t a story about a dramatic pivot. It\u0026rsquo;s about recognizing that engineering is engineering, and the tools are incidental.\nWhat Actually Transfers # Systems thinking. Before I wrote a single SQL query, I spent years thinking about how systems behave — how components interact, where failures propagate, what happens under load. A data pipeline is a system. Data quality problems are usually systems problems.\nTolerance for ambiguity. Mechanical engineering problems are often underdetermined — there are multiple valid solutions and you have to make judgment calls. Data engineering is exactly the same. The \u0026ldquo;right\u0026rdquo; schema design or the \u0026ldquo;right\u0026rdquo; pipeline architecture depends on constraints you have to surface and reason about.\nCare about reliability. A mechanical component that fails 1% of the time is not acceptable. I carried that instinct into data work. A pipeline that silently drops rows 1% of the time is not acceptable either, even if nobody notices immediately.\nWhat I Had to Unlearn # Expecting clean inputs. In mechanical design, your material properties are known. In data, the inputs are chaotic — missing fields, format changes, upstream schema drift. You design for failure, not for the happy path.\nWaterfall thinking. Mechanical projects often have long design-before-build cycles. Data work is more iterative. Ship something that works, observe it in production, improve it. The feedback loop is shorter and you should use it.\nPerfectionism before shipping. A well-designed CAD model needs to be right before it goes to manufacturing. A data model can be wrong in production and corrected. Shipping a 90% solution and improving it beats waiting for 100%.\nThe Actual Transition # The technical transition was mostly self-taught: Python, SQL, then the ecosystem around them (dbt, Airflow, cloud platforms). The learning curve was real but not the hardest part.\nThe harder part was calibrating expectations — understanding that in data work, you\u0026rsquo;re often building infrastructure that others depend on invisibly. When it works, nobody notices. When it breaks, everyone does. That\u0026rsquo;s a different kind of responsibility than designing a part that you can inspect and test before it ships.\nWould I Recommend This Path? # If you have a quantitative engineering background and are curious about data, yes. The fundamentals transfer more than you\u0026rsquo;d expect. The specific tools you can learn.\nWhat you can\u0026rsquo;t shortcut is the habit of thinking carefully about systems, reliability, and what happens when things go wrong. If you already have that from another engineering discipline, you\u0026rsquo;re closer than you think.\n","date":"10 March 2026","externalUrl":null,"permalink":"/blog/mechanical-to-data-engineer/","section":"Blog","summary":"","title":"From Mechanical Engineering to Data Engineering","type":"blog"},{"content":"","date":"10 March 2026","externalUrl":null,"permalink":"/tags/","section":"Tags","summary":"","title":"Tags","type":"tags"},{"content":"","date":"1 March 2026","externalUrl":null,"permalink":"/tags/hugo/","section":"Tags","summary":"","title":"Hugo","type":"tags"},{"content":"A collection of projects I\u0026rsquo;ve built across data engineering, automation, and side experiments. Each one documents what I set out to solve, the decisions I made along the way, and what I\u0026rsquo;d do differently.\n","date":"1 March 2026","externalUrl":null,"permalink":"/projects/","section":"Projects","summary":"","title":"Projects","type":"projects"},{"content":"","date":"1 March 2026","externalUrl":null,"permalink":"/tags/static-site/","section":"Tags","summary":"","title":"Static-Site","type":"tags"},{"content":"","date":"1 March 2026","externalUrl":null,"permalink":"/tags/tailwind/","section":"Tags","summary":"","title":"Tailwind","type":"tags"},{"content":" Overview # This site itself is a project worth documenting. It\u0026rsquo;s a statically generated portfolio and blog built with Hugo and the Blowfish theme, designed to be fast, minimal, and easy to maintain.\nThe goal: a place to document my projects, write about things I\u0026rsquo;m working through, and have a corner of the internet that\u0026rsquo;s mine — without the overhead of managing a CMS or a server.\nWhy Hugo # Fast builds — Hugo compiles in milliseconds No runtime dependencies — the output is pure HTML/CSS/JS Markdown-first — writing content is just writing markdown files Blowfish theme — modern, Tailwind-based, well-documented Static sites make sense here. There\u0026rsquo;s no dynamic content, no user authentication, nothing that requires a server. The output can be deployed to any CDN.\nStack # Static site generator: Hugo Theme: Blowfish (Tailwind CSS) Color scheme: Ocean (dark mode default) Hosting: TBD (GitHub Pages / Cloudflare Pages / Netlify) Structure # content/ ├── about/ # About page ├── projects/ # Portfolio projects (this is one of them) └── blog/ # Blog posts Each project is a self-contained directory with an index.md — this makes it easy to add images, data files, or other assets alongside the content.\nCustomization # The main customization over the default Blowfish theme:\nNavigation: Projects, Blog, About Homepage layout: Profile with custom headline Dark mode default with auto-switching Card view on project listing What I\u0026rsquo;d Do Differently # Set up the content structure before touching config — I spent time debugging broken menu links that just needed the right directory to exist Decide on the URL structure early (e.g. /projects/ vs /work/) before writing content ","date":"1 March 2026","externalUrl":null,"permalink":"/projects/personal-website/","section":"Projects","summary":"","title":"This Website","type":"projects"},{"content":"","date":"1 March 2026","externalUrl":null,"permalink":"/tags/web/","section":"Tags","summary":"","title":"Web","type":"tags"},{"content":"Most data teams know they should be testing their dbt models. Fewer have a clear strategy for what to test, where to add tests, and how to avoid a false sense of security from tests that don\u0026rsquo;t catch real problems.\nThis is what I\u0026rsquo;ve landed on after a few iterations.\nThe Problem with \u0026ldquo;Just Add Tests\u0026rdquo; # Adding not_null and unique tests to every column feels thorough. It isn\u0026rsquo;t. These tests catch structural problems but miss business logic errors — the kind where the data is technically valid but numerically wrong.\nThe goal of a testing strategy isn\u0026rsquo;t maximum coverage. It\u0026rsquo;s catching the failures that would actually hurt someone.\nLayer-by-Layer Testing # Staging models — test the raw data you don\u0026rsquo;t control:\nnot_null on required fields accepted_values on status/type enums unique on natural keys you\u0026rsquo;re treating as unique These tests catch upstream schema changes early, before they propagate into marts.\nIntermediate models — test join logic:\nRow count assertions (joining A to B should never multiply rows unless expected) relationships tests across foreign keys Custom tests for business rules that aren\u0026rsquo;t obvious from column names alone Mart models — test business correctness:\nMetric bounds (revenue shouldn\u0026rsquo;t be negative, conversion rate shouldn\u0026rsquo;t exceed 100%) Reconciliation against known totals when available Freshness assertions via dbt source freshness Custom Tests Worth Having # The built-in generic tests (not_null, unique, accepted_values, relationships) cover a lot. A few custom macros I\u0026rsquo;ve found useful:\nRow count comparison between runs — catch unexpected drops in data volume. Date range validation — events should fall within a plausible window. Metric reconciliation — compare a rolled-up mart total against a source system total.\nWhat Not to Test # Don\u0026rsquo;t test transformations that dbt itself guarantees (e.g., a coalesce works how it says it does) Don\u0026rsquo;t duplicate tests across layers — if staging validates an enum, intermediate doesn\u0026rsquo;t need to re-validate it Don\u0026rsquo;t add severity: error on tests you\u0026rsquo;d actually want to warn about. Use warn for informational tests, error for blocking ones Making Tests Useful in Practice # Tests are only useful if they run. Set up CI to run dbt test on every PR. Set up alerting on production run failures.\nAlso: when a test fails, the fix should be obvious. If you have to investigate what the test means before you can fix it, the test isn\u0026rsquo;t well-named or well-documented. Add a description.\nThe Honest Limitation # dbt tests are great at catching structural and invariant violations. They\u0026rsquo;re weak at catching subtle logic errors — a miscalculated metric that\u0026rsquo;s off by 3% will pass every structural test. For that, you need either reconciliation against a source of truth or a human who knows what the numbers should look like.\nTests are a layer of defense, not a proof of correctness.\n","date":"15 February 2026","externalUrl":null,"permalink":"/blog/dbt-testing-strategy/","section":"Blog","summary":"","title":"A Practical dbt Testing Strategy","type":"blog"},{"content":"","date":"15 February 2026","externalUrl":null,"permalink":"/tags/data-quality/","section":"Tags","summary":"","title":"Data-Quality","type":"tags"},{"content":"","date":"15 February 2026","externalUrl":null,"permalink":"/tags/dbt/","section":"Tags","summary":"","title":"Dbt","type":"tags"},{"content":"","date":"15 February 2026","externalUrl":null,"permalink":"/tags/sql/","section":"Tags","summary":"","title":"Sql","type":"tags"},{"content":"","date":"1 October 2025","externalUrl":null,"permalink":"/tags/airflow/","section":"Tags","summary":"","title":"Airflow","type":"tags"},{"content":" Overview # This project covers building a production-grade ELT pipeline that ingests raw data from multiple sources, transforms it through a layered data model, and serves it to downstream consumers with guaranteed quality guarantees.\nThe goal was to move away from fragile, one-off scripts and into a reproducible, observable pipeline that could be maintained and extended without fear.\nProblem # The existing data workflow was a collection of ad-hoc Python scripts run manually on a schedule. There was no lineage, no alerting, and no way to know if the data was stale or wrong until someone noticed a dashboard looked off.\nArchitecture # Sources (APIs, DB) │ ▼ [Extraction Layer] Python + custom connectors │ ▼ [Raw Storage] PostgreSQL / S3 staging │ ▼ [Transformation Layer] dbt (staging → intermediate → marts) │ ▼ [Serving Layer] Analytical views / BI tools Orchestration: Apache Airflow with DAGs per source Transformation: dbt with full lineage and tests Data Quality: dbt tests (not-null, unique, referential integrity) + custom macros Monitoring: Airflow alerts on failure, dbt run results logged\nKey Decisions # Why dbt for transformation? SQL-first transformations with built-in testing, documentation generation, and lineage. The learning curve is low for anyone who knows SQL, which makes it easier to hand off.\nWhy Airflow over simpler schedulers? Complex dependency graphs between DAGs, retry logic, and the need for a UI to inspect historical runs. For a simpler project, Prefect or even cron would be fine.\nLayered data model (staging → intermediate → marts) Keeps raw data untouched, makes transformations auditable, and prevents the \u0026ldquo;who touched this?\u0026rdquo; problem.\nWhat I Learned # Data quality tests need to run at every layer, not just the final mart Backfilling is always harder than you think — design for it from the start Documentation in dbt is surprisingly useful when you return to a project six months later Tech Stack # Layer Tool Orchestration Apache Airflow Transformation dbt Storage PostgreSQL Language Python 3.11 Containerization Docker CI GitHub Actions ","date":"1 October 2025","externalUrl":null,"permalink":"/projects/data-pipeline-orchestration/","section":"Projects","summary":"","title":"End-to-End Data Pipeline Orchestration","type":"projects"},{"content":"","date":"1 October 2025","externalUrl":null,"permalink":"/tags/postgresql/","section":"Tags","summary":"","title":"Postgresql","type":"tags"},{"content":"","date":"1 October 2025","externalUrl":null,"permalink":"/tags/python/","section":"Tags","summary":"","title":"Python","type":"tags"},{"content":" Overview # A deep dive into designing a data warehouse for analytical workloads — from choosing a modeling approach to handling slowly changing dimensions and optimizing for query performance.\nThis project documents the decisions behind a warehouse I built to consolidate multiple operational databases into a single source of truth for reporting.\nProblem # Multiple teams were pulling data from different operational databases and getting different numbers. There was no agreed-upon definition of core metrics, no historical snapshots, and no way to run cross-domain analytics without painful joins across systems.\nModeling Approach # I chose a Kimball-style dimensional model over a Data Vault for this use case. The reasoning:\nTeam was small, iteration speed mattered Consumers were primarily BI analysts who understood dimensional modeling Data Vault made more sense at a larger scale with more sources Layer structure:\nStaging — raw source data, typed and renamed, no business logic Intermediate — source-specific transformations and joins Dimensions — slowly changing dimension tables (SCD Type 2 where needed) Facts — grain-level transactional tables Marts — pre-aggregated tables for specific business domains Handling Slowly Changing Dimensions # SCD Type 2 was required for customer and product dimensions where historical accuracy mattered for reporting.\nImplementation approach:\nUsed dbt snapshots to track row-level changes with valid_from / valid_to date columns Surrogate keys on all dimension tables (not business keys) Grain documented in YAML and enforced with dbt tests Performance Optimization # Clustering keys on fact tables by date (the most common filter) Materialization strategy:\nHeavy intermediate models → incremental Dimensions → table (full refresh nightly) Marts → table (full refresh nightly) Query patterns: pre-aggregating common joins into mart tables rather than making analysts do it at query time\nWhat I Learned # Agree on metric definitions before building anything — retroactive changes are expensive SCD Type 2 sounds simple until you need to query point-in-time correctly Incremental models need careful thought about late-arriving data Tech Stack # Component Tool Warehouse Snowflake Transformation dbt Orchestration Airflow Source systems PostgreSQL (multiple) BI Metabase ","date":"15 June 2025","externalUrl":null,"permalink":"/projects/data-warehouse-design/","section":"Projects","summary":"","title":"Data Warehouse Design \u0026 Modeling","type":"projects"},{"content":"","date":"15 June 2025","externalUrl":null,"permalink":"/tags/data-modeling/","section":"Tags","summary":"","title":"Data-Modeling","type":"tags"},{"content":"","date":"15 June 2025","externalUrl":null,"permalink":"/tags/snowflake/","section":"Tags","summary":"","title":"Snowflake","type":"tags"},{"content":" Hey, I\u0026rsquo;m Albin Cikaj # I\u0026rsquo;m a Data Engineer based in Europe with a background in Mechanical Engineering. My path from physical systems to data systems wasn\u0026rsquo;t accidental — both disciplines share the same core obsession: understanding how things work, optimizing them, and building reliable pipelines (whether for fluid or data).\nI spend most of my time designing and building data infrastructure — ETL/ELT pipelines, data warehouses, and the tooling that makes data trustworthy and accessible for teams.\nBackground # I studied Mechanical Engineering before transitioning into the data space. That engineering mindset — thinking in systems, caring about reliability, and building things that don\u0026rsquo;t break — has shaped how I approach data problems.\nOver time I\u0026rsquo;ve worked across the full data stack: from ingestion and transformation to modeling and serving. I care a lot about data quality, reproducible pipelines, and writing infrastructure that future-me won\u0026rsquo;t curse at.\nWhat I Work With # Languages: Python, SQL Data Engineering: dbt, Apache Spark, Airflow / Prefect, Kafka Cloud \u0026amp; Infra: AWS, GCP, Terraform, Docker Databases \u0026amp; Warehouses: PostgreSQL, Snowflake, BigQuery, DuckDB Version Control \u0026amp; CI/CD: Git, GitHub Actions Outside of Work # When I\u0026rsquo;m not building pipelines, I\u0026rsquo;m usually:\nReading about systems design and distributed systems Tinkering with personal projects that scratch some technical itch Thinking about mechanical things — engines, structures, how physical systems fail Get in Touch # The best way to reach me is via email: albincikaj@gmail.com\nI\u0026rsquo;m always open to conversations about data engineering, interesting projects, or just a good technical discussion.\n","externalUrl":null,"permalink":"/about/","section":"Albin Cikaj","summary":"","title":"About Me","type":"page"},{"content":"","externalUrl":null,"permalink":"/authors/","section":"Authors","summary":"","title":"Authors","type":"authors"},{"content":"","externalUrl":null,"permalink":"/categories/","section":"Categories","summary":"","title":"Categories","type":"categories"},{"content":"","externalUrl":null,"permalink":"/series/","section":"Series","summary":"","title":"Series","type":"series"}]