about projects systems focus contact
Earth from orbit
BINUS University — Computer Science
IKBAR
FAIZ

Data Engineer // Astrophysics Pipeline // Reproducible Systems

// Messy data in. Structured artifacts out.

Engineering Signal First,
Context Second.

Ikbar Faiz

What I Build

I build systems that take raw public data through explicit ingest, metadata handling, validation, and analysis-ready outputs.

Across projects, the recurring pattern is staged processing, rerunnable artifacts, and enough provenance to inspect what changed between raw inputs and derived results.

The operating domain here is scientific data, but the transferable value is workflow design, validation discipline, and technical ownership over messy datasets.

How This Site Is Structured

Completed work is separated from active systems so finished case studies stay legible and ongoing builds stay honest.

Repositories only move into the curated project set when the workflow, artifacts, and validation story are clear enough to stand on their own.

Active systems stay in a lighter log format so platform direction, tooling, and unfinished work can remain visible without pretending to be complete.

Data Workflows

I build Python-centered workflows that take messy records through ingest, metadata shaping, validation, and structured outputs. The datasets are scientific, but the signal is engineering discipline, reproducibility, and traceable execution.

02
Rubin Sampling: Gaia-to-ZTF Period Recovery completed

A reproducible Python pipeline for ingesting public survey light curves, standardizing them into parquet artifacts, and evaluating period recovery against Gaia DR3 truth data.

Public baseline pipeline for live ingest, standardized time-series outputs, and evaluation under Rubin-like cadence constraints. The system keeps ingest behavior, failure modes, and baseline results visible instead of burying them behind final figures.

Stack: Python Astropy Pandas Gaia DR3 ZTF
03
T CrB Photometry & Raw-Image Pipeline completed

A reproducible Python workflow for ingesting, cleaning, binning, validating, and packaging multi-source photometry plus supporting raw-image assets.

Reproducible photometry and raw-image workflow with standardized outputs, overlap validation, and archive-aware provenance. The project keeps analysis products, cross-source checks, and provenance records aligned inside one explicit pipeline.

Stack: Python Pandas NumPy Matplotlib Jupyter
04
RUWE Radial KS Clustering completed

A reusable Python workflow for clustering, membership inference, quality inspection, and cross-dataset comparison on Gaia-based tables.

CLI-backed clustering and data-quality workflow for Gaia-derived tables, with preserved notebook provenance and multi-cluster outputs. It turns notebook-heavy analysis into a CLI-backed pipeline with clearer reruns and public workflow structure.

Stack: Python NumPy Pandas scikit-learn Matplotlib
05
HLSP / MAST Metadata Pipeline completed

Building a metadata-first multi-archive ingestion pipeline for HLSP collections in MAST.

Programmatic access to HLSP data is strong, but production-style ingestion still needs explicit contracts around observation metadata, product manifests, rerun safety, snapshot history, and queryable downstream storage. The repo is structured around three core entities: collections, observations, and products, with bronze raw snapshots, silver normalized parquet, and gold latest views.

Stack: Python astroquery.mast DuckDB Parquet
06
Artemis II Archive completed

A technical-editorial mission archive for Artemis II, built as a longform, single-page experience.

The structure combines mission chronology, public NASA update links, curated physics explainers, official NASA imagery, and explicit source methodology. Built with a local JSON data backbone curated from NASA public sources.

Stack: Next.js TypeScript Tailwind CSS

Current Systems

Active builds are logged separately from finished case studies so iteration, architecture decisions, and public-facing system work stay visible without being overstated.

Astrolyte

active system

Building Astrolyte as a public-facing data platform where ingest, indexing, processing, validation, and dataset surfacing are treated as explicit stages rather than hidden implementation detail.

Why It Exists A lot of technical work stops at final analysis output. Astrolyte exists to keep source records, metadata, processed artifacts, and validation surfaces legible enough that the workflow can grow into a real platform instead of remaining a collection of isolated repositories.
Current Build The current surface is live and organized around three existing workflow lineages: IRIS, Rubin Sampling, and the T CrB project. Those lineages are the first source lanes, not the final shape.
Architecture Keep ingest, metadata indexing, processed outputs, and validation visible as separate stages. Treat the site as the public surface of real workflow systems. Design the structure so new data lanes can be added without rethinking the whole platform.
Next.js TypeScript Data Platform Workflow Systems Observational Data

Technical Focus

The grouping is engineering-first: workflow handling, structured execution, and reproducibility come before the domain context.

01

Data Workflows

Ingest, standardize, validate, and package messy public data into structured artifacts that can be rerun and inspected.

02

Workflow Systems

Design multi-stage paths from raw records to derived outputs, with explicit checkpoints, provenance, and visible validation surfaces.

03

Research Software

Build domain-specific tools where the scientific context is the operating domain and engineering discipline stays in the foreground.

GitHub Activity

Recent public repository movement, kept secondary to the curated case-study set but useful as a consistency check.

STATUS: AVAILABLE LOCATION: Jakarta, Indonesia FOCUS: DATA ENGINEERING / BACKEND

Reach out
directly.

If you want to talk about data workflows, backend-oriented builds, internships, or collaboration, this is the fastest way to reach me.