Ikbar Faiz — Data Workflows & Reproducible Systems

ABOUT

Engineering Signal First,
Context Second.

What I Build

I build systems that take raw public data through explicit ingest, metadata handling, validation, and analysis-ready outputs.

Across projects, the recurring pattern is staged processing, rerunnable artifacts, and enough provenance to inspect what changed between raw inputs and derived results.

The operating domain here is scientific data, but the transferable value is workflow design, validation discipline, and technical ownership over messy datasets.

How This Site Is Structured

Completed work is separated from active systems so finished case studies stay legible and ongoing builds stay honest.

Repositories only move into the curated project set when the workflow, artifacts, and validation story are clear enough to stand on their own.

Active systems stay in a lighter log format so platform direction, tooling, and unfinished work can remain visible without pretending to be complete.

STRUCTURE: COMPLETED SEPARATED FROM ACTIVE STANDARD: WORKFLOW + ARTIFACTS + VALIDATION

VALIDATED WORK

Data Workflows

I build Python-centered workflows that take messy records through ingest, metadata shaping, validation, and structured outputs. The datasets are scientific, but the signal is engineering discipline, reproducibility, and traceable execution.

completed data workflows

IRIS Archive Ingestion & Metadata Pipeline

A metadata-aware Python workflow for querying, downloading, indexing, and analyzing public IRIS observation files.

The system keeps raw inputs, per-observation lineage, duplicate-obsid handling, and derived outputs coordinated inside one reproducible pipeline. Archive query, obs_dir-aware storage, metadata indexing, per-observation outputs, and merged event artifacts for IRIS Level 2 data.

Repository iris-solar-uv-data

Stack Python / Astropy / SunPy / NumPy / Pandas / PyArrow

Workflow Query → Download → Index → Analyze → Output

Validation Duplicate obsid handling, per-observation lineage, merged artifacts

View Repository

02

Rubin Sampling: Gaia-to-ZTF Period Recovery completed

A reproducible Python pipeline for ingesting public survey light curves, standardizing them into parquet artifacts, and evaluating period recovery against Gaia DR3 truth data.

Public baseline pipeline for live ingest, standardized time-series outputs, and evaluation under Rubin-like cadence constraints. The system keeps ingest behavior, failure modes, and baseline results visible instead of burying them behind final figures.

Stack: Python Astropy Pandas Gaia DR3 ZTF

View repo →

03

T CrB Photometry & Raw-Image Pipeline completed

A reproducible Python workflow for ingesting, cleaning, binning, validating, and packaging multi-source photometry plus supporting raw-image assets.

Reproducible photometry and raw-image workflow with standardized outputs, overlap validation, and archive-aware provenance. The project keeps analysis products, cross-source checks, and provenance records aligned inside one explicit pipeline.

Stack: Python Pandas NumPy Matplotlib Jupyter

View repo →

04

RUWE Radial KS Clustering completed

A reusable Python workflow for clustering, membership inference, quality inspection, and cross-dataset comparison on Gaia-based tables.

CLI-backed clustering and data-quality workflow for Gaia-derived tables, with preserved notebook provenance and multi-cluster outputs. It turns notebook-heavy analysis into a CLI-backed pipeline with clearer reruns and public workflow structure.

Stack: Python NumPy Pandas scikit-learn Matplotlib

View repo →

05

HLSP / MAST Metadata Pipeline completed

Building a metadata-first multi-archive ingestion pipeline for HLSP collections in MAST.

Programmatic access to HLSP data is strong, but production-style ingestion still needs explicit contracts around observation metadata, product manifests, rerun safety, snapshot history, and queryable downstream storage. The repo is structured around three core entities: collections, observations, and products, with bronze raw snapshots, silver normalized parquet, and gold latest views.

Stack: Python astroquery.mast DuckDB Parquet

View repo →

06

Artemis II Archive completed

A technical-editorial mission archive for Artemis II, built as a longform, single-page experience.

The structure combines mission chronology, public NASA update links, curated physics explainers, official NASA imagery, and explicit source methodology. Built with a local JSON data backbone curated from NASA public sources.

Stack: Next.js TypeScript Tailwind CSS

View site →

COUNT: 6 COMPLETED SYSTEMS DOMAIN: ASTRONOMY / SCIENTIFIC DATA / AEROSPACE LANGUAGE: PYTHON / TYPESCRIPT

ACTIVE BUILD PATH

Current Systems

Active builds are logged separately from finished case studies so iteration, architecture decisions, and public-facing system work stay visible without being overstated.

Astrolyte

active system

Live Site GitHub →

Building Astrolyte as a public-facing data platform where ingest, indexing, processing, validation, and dataset surfacing are treated as explicit stages rather than hidden implementation detail.

Why It Exists A lot of technical work stops at final analysis output. Astrolyte exists to keep source records, metadata, processed artifacts, and validation surfaces legible enough that the workflow can grow into a real platform instead of remaining a collection of isolated repositories.

Current Build The current surface is live and organized around three existing workflow lineages: IRIS, Rubin Sampling, and the T CrB project. Those lineages are the first source lanes, not the final shape.

Architecture Keep ingest, metadata indexing, processed outputs, and validation visible as separate stages. Treat the site as the public surface of real workflow systems. Design the structure so new data lanes can be added without rethinking the whole platform.

Next.js TypeScript Data Platform Workflow Systems Observational Data

COUNT: 1 ACTIVE SYSTEM STATUS: ONGOING BUILD

CORE AREAS

Technical Focus

The grouping is engineering-first: workflow handling, structured execution, and reproducibility come before the domain context.

01

Data Workflows

Ingest, standardize, validate, and package messy public data into structured artifacts that can be rerun and inspected.

02

Workflow Systems

Design multi-stage paths from raw records to derived outputs, with explicit checkpoints, provenance, and visible validation surfaces.

03

Research Software

Build domain-specific tools where the scientific context is the operating domain and engineering discipline stays in the foreground.

AREAS: 3 CORE DOMAINS APPROACH: ENGINEERING-FIRST

SUPPORTING SIGNAL

GitHub Activity

Recent public repository movement, kept secondary to the curated case-study set but useful as a consistency check.

astrolyte TypeScript

Astrolyte website and public surface for workflow-oriented scientific data systems.

Updated Apr 4, 2026

portfolio-site TypeScript

Personal portfolio for workflow-heavy data engineering and scientific data systems.

Updated Apr 4, 2026

iris-solar-uv-data Python

Reproducible IRIS Level 2 workflow for archive discovery, metadata indexing, per-OBS quicklooks, and duplicate-window ROI audits.

Updated Apr 3, 2026

t-crb-project Python

Reproducible T CrB observational workflow with clean products, overlap validation, and archive-aware support assets.

Updated Apr 3, 2026

REPOS: 7 PUBLIC PRIMARY: PYTHON / TYPESCRIPT

CONTACT

STATUS: AVAILABLE LOCATION: Jakarta, Indonesia FOCUS: DATA ENGINEERING / BACKEND

Reach out
directly.

If you want to talk about data workflows, backend-oriented builds, internships, or collaboration, this is the fastest way to reach me.

Phone +62 821 3409 9169

Email ikbarfaiz14@gmail.com

GitHub github.com/arsenelupin14

LinkedIn linkedin.com/in/ikbarfaiz

Email Me Download CV

Engineering Signal First,Context Second.

What I Build

How This Site Is Structured

Data Workflows

IRIS Archive Ingestion & Metadata Pipeline

Current Systems

Astrolyte

Technical Focus

Data Workflows

Workflow Systems

Research Software

GitHub Activity

Reach outdirectly.

Engineering Signal First,
Context Second.

Reach out
directly.