Withdraw
Loading…
Getafix: Workload-aware distributed interactive analytics
Ghosh, Mainak; Xu, Le; Qian, Xiaoyao; Kao, Thomas; Gupta, Indranil; Gupta, Himanshu
Loading…
Permalink
https://hdl.handle.net/2142/89289
Description
- Title
- Getafix: Workload-aware distributed interactive analytics
- Author(s)
- Ghosh, Mainak
- Xu, Le
- Qian, Xiaoyao
- Kao, Thomas
- Gupta, Indranil
- Gupta, Himanshu
- Issue Date
- 2016-03-08
- Keyword(s)
- Data management
- Workload aware
- Lookback processing
- Date of Ingest
- 2016-03-08T21:12:45Z
- Abstract
- Distributed interactive analytics engines (Druid, Redshift, Pinot) need to achieve low query latency while using the least storage space. This paper presents a solution to the problem of replication of data blocks and routing of queries. Our techniques decide the replication level of individual data blocks (based on popularity, access counts), as well as output optimal placement patterns for such data blocks. For the static version of the problem (given set of queries accessing some segments), our techniques are provably optimal in both storage and query latency. For the dynamic version of the problem, we build a system called Getafix that dynamically tracks data block popularity, adjusts replication levels, dynamically routes queries, and garbage collects less useful data blocks. We implemented Getafix into Druid, the most popular open-source interactive analytics engine. Our experiments use both synthetic traces and production traces from Yahoo! Inc.’s production Druid cluster. Compared to existing techniques Getafix either improves storage space used by up to 3.5x while achieving comparable query latency, or improves query latency by up to 60% while using comparable storage.
- Type of Resource
- text
- Genre of Resource
- Technical Report
- Permalink
- http://hdl.handle.net/2142/89289
Owning Collections
Manage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…