Tagged: Presto Toggle Comment Threads | Keyboard Shortcuts

  • admin 10:16 am on July 25, 2017 Permalink
    Tags: , Athena, backs, Launch, Presto,   

    Amazon backs Presto with their launch of Athena 

    Latest imported feed items on Analytics Matters

  • admin 9:52 am on June 28, 2016 Permalink
    Tags: , , , , , , Presto,   

    Business Intelligence Leaders Join with Teradata to Enhance Presto for the Enterprise 

    Looker, Qlik, Tableau and more bring choices to the enterprise-ready SQL-on-Hadoop solution
    Teradata United States

  • admin 9:47 am on May 12, 2016 Permalink
    Tags: , , , Presto,   

    The Power of Presto Open Source for the Enterprise 

    Teradata White Papers

  • admin 10:34 am on March 12, 2016 Permalink
    Tags: Area, , Presto   

    Bay Area Presto MeetUp 

    Join us at the latest Presto Meetup at Facebook Menlo Park campus (1 Hacker Way, Menlo Park, CA). We will have talks by the Presto team, Teradata, and other community members. Please arrive 15 min earlier and check-in at building 10. There will be food and drinks. Please RSVP as space is limited. Make sure to bring your business card to be entered into a drawing to win a raffle prize.
    Teradata Events

  • admin 10:33 am on January 6, 2016 Permalink
    Tags: , Petabyte, Presto, , , Seconds   

    The Magic of Presto: Petabyte Scale SQL Queries in Seconds 

    In this webinar, two early commercial adopters of Presto – Teradata and Treasure Data – will give a technical overview of Presto, covering such topics as: What makes Presto a great choice for SQL-on-Hadoop, discover Presto’s internals, including the principles behind its design, learn more about in-production use cases, both cloud and on-premises, and how to get started on Presto.
    Teradata Events

  • admin 9:55 am on December 16, 2015 Permalink
    Tags: , Presto,   

    Presto Overview Video 

    Teradata Videos

  • admin 9:52 am on September 29, 2015 Permalink
    Tags: , , Presto, , ,   

    Teradata Accelerates Roadmap for Open Source Presto 

    Teradata responds to community’s request for enterprise-class ODBC/JDBC Drivers for Presto; opening business intelligence and analytic applications for the open source query engine
    Teradata News Releases

  • admin 10:36 am on September 24, 2015 Permalink
    Tags: , , , Presto,   

    Presto, an Open Source SQL Engine for Big Data 

    Please join Presto founding engineers from Facebook and Presto contributors from Teradata. This talk will cover an overview of Presto, current use cases, technical talks of development efforts at both Facebook and Teradata, as well as future plans for Presto.
    Teradata Events

  • admin 9:55 am on July 30, 2015 Permalink
    Tags: Begins, Presto, SQLOnHadoop, ,   

    Teradata’s SQL-on-Hadoop Strategy Begins with Presto 

    Teradata Press Mentions

  • admin 9:51 am on June 26, 2015 Permalink
    Tags: , Presto   

    Why We Love Presto 

    Concurrent with acquiring Hadoop companies Hadapt and Revelytix last year, Teradata opened the Teradata Center for Hadoop in Boston. Teradata recently announced that a major new initiative of this Hadoop development center will include open-source contributions to a distributed SQL query engine called Presto. Presto was originally developed at Facebook, and is designed to run high performance, interactive queries against Big Data wherever it may live — Hadoop, Cassandra, or traditional relational database systems.

    Among those people who will be part of this initiative and contributing code to Presto include a subset of the Hadapt team that joined Teradata last year. In the following, we will dive deeper into the thinking behind this new initiative from the perspective of the Hadapt team. It is important to note upfront that Teradata’s interest in Presto, and the people contributing to the Presto codebase, extends beyond the Hadapt team that joined Teradata last year. Nonetheless, it is worthwhile to understand the technical reasoning behind the embrace of Presto from Teradata, even if it presents a localized view of the overall initiative.

    Around seven years ago, Ashish Thusoo and his team at Facebook built the first SQL layer over Hadoop as part of a project called Hive. At its essence, Hive was a query translation layer over Hadoop: it received queries in a SQL-like language called Hive-QL, and transformed them into a set of MapReduce jobs over data stored in HDFS on a Hadoop cluster. Hive was truly the first project of its kind. However, since its focus was on query translation into the existing MapReduce query execution engine of Hadoop, it achieved tremendous scalability, but poor efficiency and performance, and ultimately led to a series of subsequent SQL-on-Hadoop solutions that claimed 100X speed-ups over Hive.

    Hadapt was the first such SQL-on-Hadoop solution that claimed a 100X speed-up over Hive on certain types of queries. Hadapt was spun out of the HadoopDB research project from my team at Yale and was founded by a group of Yale graduates. The basic idea was to develop a hybrid system that is able to achieve the fault-tolerant scalability of the Hive MapReduce query execution engine while leveraging techniques from the parallel database system community to achieve high performance query processing.

    The intention of HadoopDB/Hadapt was never to build its own query execution layer. The first version of Hadapt used a combination of PostgreSQL and MapReduce for distributed query execution. In particular, the query operators that could be run locally, without reliance on data located on other nodes in the cluster, were run using PostgreSQL’s query operator set (although Hadapt was written such that PostgreSQL could be replaced by any performant single-node database system). Meanwhile, query operators that required data exchange between multiple nodes in the cluster were run using Hadoop’s MapReduce engine.

    Although Hadapt was 100X faster than Hive for long, complicated queries that involved hundreds of nodes, its reliance on Hadoop MapReduce for parts of query execution precluded sub-second response time for small, simple queries. Therefore, in 2012, Hadapt started to build a secondary query execution engine called “IQ” which was intended to be used for smaller queries. The idea was that all queries would be fed through a query-analyzer layer before execution. If the query was predicted to be long and complex, it would be fed through Hadapt’s original fault-tolerant MapReduce-based engine. However, if the query would complete in a few seconds or less, it would be fed to the IQ execution engine.

    presto graphic blogIn 2013 Hadapt integrated IQ with Apache Tez in order avoid redundant programming efforts, since the primary goals of IQ and Tez were aligned. In particular, Tez was designed as an alternative to MapReduce that can achieve interactive performance for general data processing applications. Indeed, Hadapt was able to achieve interactive performance on a much wider-range of queries when leveraging Tez, than what it was able to achieve previously.

    Figure 1: Intertwined Histories of SQL-on-Hadoop Technology

    Unfortunately Tez was not quite a perfect fit as a query execution engine for Hadapt’s needs. The largest issue was that before shipping data over the network during distributed operators, Tez first writes this data to local disk. The overhead of writing this data to disk (especially when the size of the intermediate result set was large) precluded interactivity for a non-trivial subset of Hadapt’s query workload. A second problem is that the Hive query operators that are implemented over Tez use (by default) traditional Volcano-style row-by-row iteration. In other words, a single function-invocation for a query operator would process just a single database record. This resulted in a larger number of function calls required to process a large dataset, and poor instruction cache locality as the instructions associated with a particular operator were repeatedly reloaded into the instruction cache for each function invocation. Although Hive and Tez have started to alleviate this issue with the recent introduction of vectorized operators, Hadapt still found that query plans involving joins or SQL functions would fall back to row-by-row iteration.

    The Hadapt team therefore decided to refocus its query execution strategy (for the interactive query part of Hadapt’s engine) to Presto, which presented several advantages over Tez. First, Presto pipelines data between distributed query operators directly, without writing to local disk, significantly improving performance for network-intensive queries. Second, Presto query operators are vectorized by default, thereby improving CPU efficiency and instruction cache locality. Third, Presto dynamically compiles selective query operators to byte code, which lets the JVM optimize and generate native machine code. Fourth, it uses direct memory management, thereby avoiding Java object allocations, its heap memory overhead and garbage collection pauses. Overall, Presto is a very advanced piece of software, and very much in line with Hadapt’s goal of leveraging as many techniques from modern parallel database system architecture as possible.

    The Teradata Center for Hadoop has thus fully embraced Presto as the core part of its technology strategy for the execution of interactive queries over Hadoop. Consequently, it made logical sense for Teradata to take its involvement in the Presto to the next level. Furthermore, Hadoop is fundamentally an open source project, and in order to become a significant player in the Hadoop ecosystem, Teradata needs to contribute meaningful and important code to the open source community. Teradata’s recent acquisition of Think Big serves as further motivation for such contributions.

    Therefore Teradata has announced that it is committed to making open source contributions to Presto, and has allocated substantial resources to doing so. Presto is already used by Silicon Valley stalwarts Facebook, AirBnB, NetFlix, DropBox, and Groupon. However, Presto’s enterprise adoption outside of silicon valley remains small. Part of the reason for this is that ease-of-use and enterprise features that are typically associated with modern commercial database systems are not fully available with Presto. Missing are an out-of the-box simple-to-use installer, database monitoring and administration tools, and third-party integrations. Therefore, Teradata’s initial contributions will focus in these areas, with the goal of bridging the gap to getting Presto widely deployed in traditional enterprise applications. This will hopefully lead to more contributors and momentum for Presto.

    For now, Teradata’s new commitments to open source contributions in the Hadoop ecosystem are focused on Presto. Teradata is only committing to contribute a small amount of Hadapt code to open source — in particular those parts that will further the immediate goal of transforming Presto into an enterprise-ready, easy-to-deploy piece of software. However, Teradata plans to monitor Presto’s progress and the impact of Teradata contributions. Teradata may ultimately decide to contribute more parts of Hadapt to the Hadoop open source community. At this point it is too early to speculate how this will play out.

    Nonetheless, Teradata’s commitment to Presto and its commitment to making meaningful contributions to an open source project is an exciting development. It will likely have a significant impact on enterprise-adoption of Presto. Hopefully, Presto will become a widely used open source parallel query execution engine — not just within the Hadoop community, but due to the generality of its design and its storage layer agnosticism, for relational data stored anywhere.


    daniel abadi crop BLOG bio mgmtDaniel Abadi is an Associate Professor at Yale University, founder of Hadapt, and a Teradata employee following the recent acquisition. He does research primarily in database system architecture and implementation. He received a Ph.D. from MIT and a M.Phil from Cambridge. He is best known for his research in column-store database systems (the C-Store project, which was commercialized by Vertica), high performance transactional systems (the H-Store project, commercialized by VoltDB), and Hadapt (acquired by Teradata). http://twitter.com/#!/daniel_abadi.

    The post Why We Love Presto appeared first on Data Points.

    Teradata Blogs Feed

Compose new post
Next post/Next comment
Previous post/Previous comment
Show/Hide comments
Go to top
Go to login
Show/Hide help
shift + esc