Airbyte, creators of a fast-growing open-source data integration platform, made available results of the biggest data engineering survey in the market which provides insights into the latest trends, tools, and practices in data engineering – especially adoption of tools in the modern data stack.
Its first worldwide State of Data survey displays results in an interactive format so that anyone can drill further into the information using filters to see, for example, adoption patterns by organization size. There were 886 respondents in the survey – the largest related to data engineering – that was fairly evenly distributed by geography (North America, Europe, and Asia), as well as company size, and years of experience working. The primary job title was data engineer at 38%, another 20% in management positions, and 11% software engineers. Analytics engineer, data analyst, and data scientist were around 5% apiece.
“In the past year, the data ecosystem has been evolving rapidly, so this research of the user community is a way to see the signal through the noise in the modern data stack,” said John Lafleur, co-founder and chief operating officer, Airbyte. “New options are introduced every month, so this research is a way for us to take a step back and understand what the community is using and feeling excited about.”
Noteworthy findings include the following.
- Nearly half the respondents were looking to hire for their data teams with consistent results across different worldwide geographic regions.
- In terms of compensation, larger companies correlate with more pay, and North America has the highest salaries.
- For the Data Ingestion category of the modern data stack, clear leaders are Airbyte and Fivetran. Airbyte shows double the number of people that want to try it. In terms of company size, Airbyte is strong in the small/medium-sized segment with less adoption in the mid-size market (500-1,000 employees). However, the enterprise segment (1,000+ employees) shows a propensity for enterprises to adopt an open-source self-hosted platform. (Airbyte Open Source being the dominant solution there.)
- For Data Transformation, most used is Pandas while dbt shows the most “want to try” among respondents. This is even more noticeable in the larger organization segments where both Spark and Pandas are more used than dbt. However, dbt shows the most “want to try” among those users.
- The most used data warehouses are Snowflake and Google BigQuery, then AWS Redshift and Databricks with Azure Synapse lagging behind. In the larger organization segments, Databricks popularity is near on par with Snowflake and BigQuery.
- For Data Orchestration, most people are still using self-hosted Airflow, especially in the enterprise segment, but Dagster and Prefect show lots of interest. Most people are still using self-hosted Airflow, which may again (like in Data Ingestion) indicate a preference for self-hosted deployments for larger organizations. It should be noted that Dagster is definitely coming up the ranks with the highest number of ‘want to try’.
- For Business Intelligence, the leaders are Looker and Tableau, but newer technologies are close behind and show lots of interest.
- For Data Quality, leaders are Great Expectations and Monte Carlo and a lack of awareness among other alternatives.
- For Reverse ETL, it was essentially a tie between Hightouch and Census as leaders, and pretty much open after that.
- For Data Catalogs, there were three companies leading the way in terms of popularity, DataHub, Alation, and Amundsen.
To view the full results of the survey, go to https://state-of-data.com.
Here is what some data engineering influencers said about the State of Data survey:
“The data engineering community stands out for its open-mindedness and collaborative spirit. Every day, I’m impressed by how we’ve created a culture of learning and sharing that transcends organizational boundaries and geographical constraints.” Ananth Packkildurai, editor, Data Engineering Weekly
“Amazing Data Engineering survey! I highly recommend checking out the insights into the adoption of engineering tools from Data Ingestion, transformation to reverse ETL and Data Catalogs. That section was my highlight. Congratulations to Airbyte for leading the Data Ingestion section.” Andreas Kretz, founder of Learn Data Engineering
“I am particularly happy to see the growth of Data Quality tools that have evolved for good. This signals maturity is coming along. It’s not a shocker to me Airbyte still leading the way for the Data Ingestion Layer.” from Ravit Jain, founder & host of The Ravit Show
Sign up for the free insideBIGDATA newsletter.
Join us on Twitter: https://twitter.com/InsideBigData1
Join us on LinkedIn: https://www.linkedin.com/company/insidebigdata/
Join us on Facebook: https://www.facebook.com/insideBIGDATANOW