![]() used for BigQuery generated suitable performance, coming in around 9 geomean seconds per query, which was right in the middle of the pack (Synapse 1X was slightly slower). The 1X configuration that Fivetran and Brooklyn Data Co. But its 2X result (configured as DW2000c, or occupying 2000 data warehouse units in Azure) generated only marginal improvements in speed, while costing nearly twice as much (although it still was cheaper than Snowflake’s 2X result when computed using Fivetran’s “cost per hour” metric).īigQuery was the odd man out, both in terms of the results and configuration (since it’s only available as an on-demand offering). The Synapse 1X result was fairly close to the 1X results for Snowflake, Databricks, and AWS. The three results for Microsoft Azure Synapse showed the biggest variety when plotted on the cost vs. The results of the benchmark (Source: Fivetran Cloud Data Warehouse Benchmark) “We have been made aware of several issues with our Databricks results, and we are currently re-running that portion of the benchmark,” Fivetran CEO Fraser stated in the blog on Tuesday. Its 1X configuration was a smidge slower than AWS’s, but significantly cheaper than either Snowflake’s or AWS’s 1X result. With a geomean time of about 8 seconds per query, Databricks’ 2X configuration was a tad slower than Snowflake’s 2X configuration, at roughly the same cost per query. Redshift’s benchmark result was just a hair slower and a couple cents more expensive than Snowflake’s 1X instance.ĭatabricks achieved the lowest cost per hour, $4.64, with the 0.5x configuration, which ran on a Medium instance on AWS. The single Amazon Redshift result, in a 1X configuration running on a 5x ra3.4xlarge AWS instance, sat right in the middle of the pack, with a $16.30 cost per hour. However, at a cost of $32.26 per hour, Snowflake 2X was also the most expensive setup. Snowflake achieved the fastest average query execution time with the 2X configuration (an XLarge instance running on AWS), with a geomean time of about 5 seconds per query. “All warehouses had excellent execution speed, suitable for ad hoc, interactive querying,” Fivetran wrote in its report. The results of the tests show that all five warehouses are fairly equal in terms of performance and cost, with a definite cluster of warehouse results appearing on the cost vs time graph. ![]() Each query was run only once, to prevent the database from caching the results. These queries were “complex,” Fivetran said, with lots of joins, aggregations, and subqueries. The partners then ran 99 TPC-DS queries against the retail database, and calculated how long it took each to run. The configurations of the five warehouses (Source: Fivetran Cloud Data Warehouse Benchmark) While it ran AWS’s Amazon Redshift in three configurations, the partners only reported the results of one because they were not able to reproduce results across different configurations, they wrote. The lone exception was Google Cloud’s BigQuery, which can’t be configured because it’s only available as an on-demand service. configured each data warehouse three different ways to account for differences in cost and performance. The database included 1TB of data, which is the smallest scale factor authorized by TPC-DS (it’s also available in 3TB, 10TB, 30TB, and 100TB scale factors).įivetran and Brooklyn Data Co. They used the TPC-DS data set, which is a decision support benchmark rolled out by the Transaction Processing Performance Council in 2015 that depicts data from an imaginary retailer. To conduct the test, Fivetran partnered with Brooklyn Data Co., a data and analytics consultancy. To find out which warehouse was best, the company decided to do an apples-to-apples comparison (or at least as close to one as possible). The big takeaway is that all of the cloud warehouses perform well, but some are easier to use and tune than others.Īs an ETL vendor, Fivetran often gets asked by customers which cloud data warehouse they should use, writes CEO George Fraser in a blog post Monday. ETL software maker Fivetran this week released results of a benchmark test it ran comparing the cost and performance of five cloud data warehouses, including BigQuery, Databricks, Redshift, Snowflake, and Synapse.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |