Mannlowe Ops - Airbyte or Meltano?
By Advait Sakhalkar on April 22, 2024
ExpertBoth of them are open-sourced data integration tools, with cloud offerings. Which one is better, or more fit you? This is Jove, cofounder at Timeplus, a streaming database company (with cloud offering). I’ve been using them over 1 year. Not every day, not every week. Maybe 1–3 times a month. As an end user AND a connector developer. If you are as lazy as me, here is the guide how to choose one over the other.
Let’s start with the cover image:
Both are nice. Personally I like Meltano more.
I do have some connections (met their founders&team in a few tech conference in person) This is not an endorsement or sponsored content. I just want to share a bit, as a beginner user of both Airbyte and Meltano, what I like and what I struggle.
You read it right. I am more leaning towards Meltano, even they don’t raise as much 💰 as Airbyte. Well, as a user and a potential paid customer, shall I care about the VC 💰? Maybe a little (I don’t want that tool will sunset after I setup everything well) but shall not care too much.
I don’t expect to write a 10-page report. Let’s go over those rows in my comparison table now.
Open Source License
License FAQ | Airbyte Documentation
Airbyte Licensing Overview
Let’s face it. Not everyone is happy with Airbyte’s license, or even how they organize the source code. The platform code in Elastic License v2, basically meaning source-code available, you can use it, change it, but just cannot turn this as a SaaS and charge others. Everyone can contribute connectors for Airbyte and then the code will be maintained by Airbyte, not the original developers. This is an interesting design. This is supposed to solve both the long-tail issue and the lack of maintenance for such large amount of connectors.
meltano/LICENSE at main · meltano/meltano
Extract & Load with joy - CLI & version control for ELT without limitations. No more black box. Let your creativity…
Meltano simply chooses MIT for everything.
Cloud Pricing
Both of them can be deployed locally, or on-prem, with Docker or k8s. If you don’t want to setup and update them. You can consider purchasing their cloud offering.
Pricing is a big topic and there are just too many options. With a few iteration, Airbyte Cloud can accept signup from everyone. If you are just use those non-GA connectors, it’s FREE. For example, the Timeplus destination connector is in alpha stage, so you can use it for free in Airbyte Cloud. But if you need to use a GA connector like Hubspot (as me), you need to bind the credit and purchase some credits.
Pricing | Airbyte - Open-source data integration
Airbyte offers the first transparent and scalable pricing across ETL / ELT. Based on compute time, Airbyte enables you…
Buy credits before you run any GA sync:
The minimal credit you can buy is 20, i.e. USD 50. This will allow you to sync ~ 1.3 million rows.
For Meltano, the pricing model is more complicated. Daily job and hourly job are charged differently.
Meltano Cloud Fees | Meltano Documentation
Meltano Cloud is currently in Beta.
BTW, there is some discount if you purchase the Meltano cloud credit while it’s still in beta.
Web UI, CLI and Yaml
Clearly Airbyte’s web UI is way better than Meltano. And clearly Meltano’s CLI is way better than Airbyte.
At very beginning, I was not so used the meltano
and meltano-cloud
commands to add plugin, set config, dry-run and manage the cloud deployments and schedules. Later on, I just enjoy using the CLI.
For Airbyte, my understanding, most of the configuration need to be done via web UI. The octavia
CLI is still in alpha phase.
CLI documentation | Airbyte Documentation
Disclaimer
I like the Infrastructure-as-Code (IaC) of Meltano a lot. You can define almost everything in the yaml file. As a fact, in most cases, the CLI just helps you to update the yaml file, except sensitive API keys etc.
For Meltano Cloud, I just need to link the github repo to the cloud account and I make most of changes by the yaml, and trigger some run with meltano-cloud
Quality vs Quantity
https://airbyte.com/connectors
There are 350+ connectors maintained by Airbyte team.
On hub.meltanto.com the number is bigger.
So it seems that Meltano wins.
I don’t have a strong prove, but with my limited data points, I think in general Airbyte connectors are in better quality. For example, one of my pipeline is sending HubSpot data to my Timeplus workspace.
With Airbyte, I can get many properties from hubspot objects, no matter built-in or custom-defined.
On Meltano, the list is much shorter.
But still on the quality topic, I am not impressed for Airbyte platform.
Maybe just me, but over 50% chances when I tried something with Airbyte, I will fail. I was surprised for various errors:
- OOTB docker compose can miss a file
airbyte/temporal/dynamicconfig/development.yaml
- OOTB k8s helm chart doesn’t really work.
cpuRequest=0.5,cpuLimit=
- ..
It almost drives me to purchase credits on their Airbyte Cloud. So maybe it’s by design not a bug?
On the Meltano side, the documentations are a bit overwhelming but I rarely hit issues when I try something new.
🛟 Tech support
Both of them maintain Slack community. I don’t want to list the number of members, since I don’t care. Actually you will see there are a lot of issues reported in Airbyte slack. Most of them are not answered. There is a new AI bot tried to be helpful, but I doubt.
Meltano community is smaller but much nicer/closer.
To me, it’s a culture thing, a founder thing.
(PS, I built a data connector for my company. I submitted it to both Airbyte and Meltano. One took 6 months, and the other took 12 hours to review and list on catalog. I am sure you can guess which one is which)
If you read this far, hopefully you see why I more like Meltano now. Here is a text version of the table
As I keep saying, choose the one that fits for you, not because it’s best in the world. So my guide to a lazy data engineer who doesn’t want to spend a lot of time and money to look into the details:
- if you are not comfortable to use command line, edit yaml, or not that technical, just use Airbyte OSS or even the cloud.
- if you think nice UI is optional and look for a lot of parameters/pipeline tunings, choose Meltano
- if you are building a new data source, or a new database or destination, build for both platform. Essentially they are using same or similar Singer SDK (I can be wrong, but 90% of my connector code for Airbyte and Meltano are same)
Again this is a personal tech blog. I don’t want to put our company-wide relationship to Airbyte/Meltano at risk. I personally enjoy the free options to sync lots of data to my Timeplus databases, such as Hubspot, Github, CSV, Database, and build charts there.
BTW, check this if you are wondering why this is in my personal medium, not company blog.
Why I use them all: Substack, Medium, Linkedin etc
Why and how I write contents on 9 different platforms
Change log:
July 26: changed references of “meltano” to “Meltano”
18
1
Written by Jove Zhong
Co-founder & Head of Product | Timeplus
Follow
More from Jove Zhong
How I built a Meltano target within 1 hour
How to build a new Meltano target for SaaS API, so that you can load data from 500+ sources into your system.
6 min read·Apr 11, 2023
51
in
Docker Hub, or GHCR, or ECR: Lazy man’s guide
In this “lightning blog”, I will share our experience as a startup company regarding choosing different docker registry over time.
8 min read·Feb 6, 2024
34
in
Query Kafka with SQL (Guide for Coffee Lovers)
Speaker Session at Current 2023
10 min read·Oct 3, 2023
9
Best way to create website redirects on AWS
Part of my role as co-founder in Timeplus, a startup company to empower developers to quickly build powerful streaming analytics…
6 min read·Sep 30, 2022
16
Recommended from Medium
MODERN DATA STACK — AIRBYTE, DBT AND APACHE AIRFLOW
INTRODUCTION
11 min read·Dec 10, 2023
170
in
Extracting Column-Level Lineage from SQL
How we built one of the best open-source SQL lineage parsers.
8 min read·Nov 9, 2023
173
Lists
data science and AI
Icon Design
Natural Language Processing
Staff Picks
in
Building a Data Platform in 2024
How to build a modern, scalable data platform to power your analytics and data science projects (updated)
9 min read·Feb 6, 2024
2.7K
in
From Zero to dbt: How to Analyze and Build Data Models from Spotify’s Million Playlist Data
Part 1: Analyze the 30GB json dataset with DuckDb and jq, then convert to Parquet to prep for dbt
10 min read·Apr 12, 2024
202
in
dbt + Airflow = ❤
Building dbt-airflow: A Python package that integrates dbt and Airflow
12 min read·6 days ago
227
in
How do we structure a data team here at Mercado Libre?
Have you ever wondered how a large e-commerce company structures its data teams? So, check it out in the paragraphs below !
12 min read·Dec 15, 2023
58
More articles on Airbyte