FMEM: A Fine- grained Memory Estimator for MapReduce Jobs

Like dokumenter
GeWare: A data warehouse for gene expression analysis

Neural Network. Sensors Sorter

Reliable RT Spotify

Oppgave 1a Definer følgende begreper: Nøkkel, supernøkkel og funksjonell avhengighet.

Unit Relational Algebra 1 1. Relational Algebra 1. Unit 3.3

Virginia Tech. John C. Duke, Jr. Engineering Science & Mechanics. John C. Duke, Jr.

Multimedia in Teacher Training (and Education)

Gir vi de resterende 2 oppgavene til én prosess vil alle sitte å vente på de to potensielt tidskrevende prosessene.

Slope-Intercept Formula

matematikk s F4814A8B1B37D77C639B3 Matematikk S1 1 / 6

PSi Apollo. Technical Presentation

Han Ola of Han Per: A Norwegian-American Comic Strip/En Norsk-amerikansk tegneserie (Skrifter. Serie B, LXIX)

HONSEL process monitoring

FASMED. Tirsdag 21.april 2015

Improving Customer Relationships

Skog som biomasseressurs: skog modeller. Rasmus Astrup

Issues and challenges in compilation of activity accounts

Exercise 1: Phase Splitter DC Operation

Andrew Gendreau, Olga Rosenbaum, Anthony Taylor, Kenneth Wong, Karl Dusen

Estimating Peer Similarity using. Yuval Shavitt, Ela Weinsberg, Udi Weinsberg Tel-Aviv University

Den europeiske byggenæringen blir digital. hva skjer i Europa? Steen Sunesen Oslo,

Fakultet for informasjonsteknologi, Institutt for datateknikk og informasjonsvitenskap AVSLUTTENDE EKSAMEN I. TDT42378 Programvaresikkerhet

Databases 1. Extended Relational Algebra

Jeanette Wheeler, C-TAGME University of Missouri-Kansas City Saint Luke s Mid America Heart Institute

SAS FANS NYTT & NYTTIG FRA VERKTØYKASSA TIL SAS 4. MARS 2014, MIKKEL SØRHEIM

AMS-case forts. Eksemplifisering av modellbasert. tilnærming til design av brukergrensesnitt

Medisinsk statistikk, KLH3004 Dmf, NTNU Styrke- og utvalgsberegning

Endelig ikke-røyker for Kvinner! (Norwegian Edition)

1. Explain the language model, what are the weaknesses and strengths of this model?

Moving Objects. We need to move our objects in 3D space.

Litteraturoversikter i vitenskapelige artikler. Hege Hermansen Førsteamanuensis

EN Skriving for kommunikasjon og tenkning

5 E Lesson: Solving Monohybrid Punnett Squares with Coding

Server-Side Eclipse. Bernd Kolb Martin Lippert it-agile GmbH

UNIVERSITETET I OSLO ØKONOMISK INSTITUTT

Legacy System Exorcism by Pareto s Principle. Kristoffer Kvam/Rodin Lie Kjetil Jørgensen-Dahl

Kundetilfredshetsundersøkelse FHI/SMAP

case forts. Alternativ 1 Alternativer Sammensetning Objekt-interaktor med valg

STILLAS - STANDARD FORSLAG FRA SEF TIL NY STILLAS - STANDARD

PATIENCE TÅLMODIGHET. Is the ability to wait for something. Det trenger vi når vi må vente på noe

verktøyskrin Grafisk profil ved Norges teknisk-naturvitenskapelige universitet

Hvordan kvalitetssikre åpne tidsskrift?

Dynamic Programming Longest Common Subsequence. Class 27

2A September 23, 2005 SPECIAL SECTION TO IN BUSINESS LAS VEGAS

Midler til innovativ utdanning

Confidence-based Data Management for Personal Area Sensor Nets

UNIVERSITETET I OSLO ØKONOMISK INSTITUTT

Stationary Phase Monte Carlo Methods

Examination paper for (BI 2015) (Molekylærbiologi, laboratoriekurs)

TriCOM XL / L. Energy. Endurance. Performance.

Tor Haakon Bakken. SINTEF Energi og NTNU

Fra tradisjonell komponentbasert overvåking 5l tjenestebasert overvåking. April 2017

MID-TERM EXAM TDT4258 MICROCONTROLLER SYSTEM DESIGN. Wednesday 3 th Mars Time:

TUNNEL LIGHTING. LED Lighting Technology

Administrasjon av postnummersystemet i Norge Post code administration in Norway. Frode Wold, Norway Post Nordic Address Forum, Iceland 5-6.

Server-Side Eclipse. Martin Lippert akquinet agile GmbH

verktøyskrin Grafisk profil ved Norges teknisk-naturvitenskapelige universitet

DA DET PERSONLIGE BLE POLITISK PDF

Existing Relay-based Interlocking System Upgrades (NSI-63) Ombygging av relebasert sikringsanlegg (NSI-63) Entreprise UBF 42 / Contract UBF 42

FIRST LEGO League. Härnösand 2012

Internasjonale aktiviteter innenfor funksjonskontroll med bidrag fra PFK

Brukers Arbeidsflate. Tjeneste Katalog. Hva vi leverer... Presentasjon Administrasjon Automatisering

Den som gjør godt, er av Gud (Multilingual Edition)

Software applications developed for the maritime service at the Danish Meteorological Institute

Prosjekt Adressekvalitet

UNIVERSITETET I OSLO ØKONOMISK INSTITUTT

AVSLUTTENDE EKSAMEN I/FINAL EXAM. TDT4237 Programvaresikkerhet/Software Security. Mandag/Monday Kl

Call function of two parameters

Public roadmap for information management, governance and exchange SINTEF

Kanskje en slide som presenterer grunderen?

KROPPEN LEDER STRØM. Sett en finger på hvert av kontaktpunktene på modellen. Da får du et lydsignal.

Trigonometric Substitution

A NEW REALITY. DNV GL Industry Outlook for Kjell Eriksson, Regional Manager Oil & Gas, Norway 02 Februar - Offshore Strategi Konferansen 2016,

On Capacity Planning for Minimum Vulnerability

Bostøttesamling

Internationalization in Praxis INTERPRAX

Tjenestekjøp i offentlig sektor

Samlede Skrifter PDF. ==>Download: Samlede Skrifter PDF ebook

Enkel og effektiv brukertesting. Ida Aalen LOAD september 2017

Eksamensoppgave i PSY3100 Forskningsmetode - Kvantitativ

Røde Kors Grunnkurs i Førstehjelp

Interaction between GPs and hospitals: The effect of cooperation initiatives on GPs satisfaction

Utvikling av skills for å møte fremtidens behov. Janicke Rasmussen, PhD Dean Master Tel

Hvor finner vi flått på vårbeiter? - og betydning av gjengroing for flåttangrep på lam på vårbeite

Digital Transformasjon

Physical origin of the Gouy phase shift by Simin Feng, Herbert G. Winful Opt. Lett. 26, (2001)

Digitization of archaeology is it worth while?

Management of the Construction Process, from the perspective of Veidekke

The CRM Accelerator. USUS February 2017

SRP s 4th Nordic Awards Methodology 2018

Generalization of age-structured models in theory and practice

ISO 41001:2018 «Den nye læreboka for FM» Pro-FM. Norsk tittel: Fasilitetsstyring (FM) - Ledelsessystemer - Krav og brukerveiledning

Hybrid Cloud and Datacenter Monitoring with Operations Management Suite (OMS)

// Translation // KLART SVAR «Free-Range Employees»

SVM and Complementary Slackness

From Policy to personal Quality

Eksamensoppgave i GEOG1005 Jordas naturmiljø

Climate change and adaptation: Linking. stakeholder engagement- a case study from

Examination paper for SØK2009 International Macroeconomics

The building blocks of a biogas strategy

Transkript:

FMEM: A Fine- grained Memory Estimator for MapReduce Jobs Lijie Xu 1,2, Jie Liu 1, and Jun Wei 1 1 Institute of Software, Chinese Academy of Sciences 2 University of Chinese Academy of Sciences 6/26/2013

Outline What is FMEM? Why do we develop it? Is it a hard problem? How to develop it? How to estimate the fine- grained memory usage? Evaluation Related work Conclusion

What is FMEM? It is designed to analyze, predict and optimize the fine- grained memory usage of map and reduce tasks.

Why do we want to develop FMEM? For users feel hard to specify the memory- related configurations. e.g., buffer size, reducer number, Xmx, Xms, and etc. OutOfMemory error or performance degradation For task schedulers e.g., YARN and Mesos they allocate resource and schedule tasks according to the CPU and memory requirement of mappers/ reducers. For Hadoop committers design a more memory- efficient framework.

So, what is the fundamental problem? Tell job s memory usage before running it on big dataset. f (Data d, Conf f, Program p, Resource r) à memory usage mu of mappers and reducers Add <SampData sd, SampConf sf, SampProfiles sp> à memory usage mu of real mappers and reducers Big data sampling sample sampconf Dataflow& memory usage Sample job profiles Conf FMEM big job profiles

Is it a hard problem? User layer Data map() { } reduce() { } Pig /HiveQL Configura<on Memory Framework layer JVM disk Execution layer

How to do it? dataflowà objectsà usage User layer Data map() { } reduce() { } Pig /HiveQL Configura<on Memory Framework layer JVM Dataflow volume disk In- memory objects size Execution layer memory usage rules of memory usage

How to do it? dataflowà objectsà usage Built- in Monitor get Records/Bytes Statistics & real- time memory usage Profiler Instrument get sta<s<cs count intermediate data and TempObjs in each phase Dataflow volume

How to do it? dataflowà objectsà usage Estimate the dataflow We actually build a MapReduce simulator to estimate the intermediate data under specific <Data d, Conf f, sample profiles sp>. Dataflow volume

Which are memory- consuming objects? Map & Spill Phase Merge Phase Shuffle Phase Sort Phase Reduce Phase Input Split map() Spill Buffer kv buffer kvindices kvoffsets partition, sort, combine, spill to disk Segment merge on disk HTTP fetch shuffle, combine, in-memory & on-disk merge OnDiskSeg OnDiskSeg OnDiskSeg merge & sort reduce() Output Memory buffer (configuration) In- memory e.g., spill buffer (io.sort.mb) and user- defined arrays objects size <K, V> records (dataflow) map() outputs records into spill buffer, in- memory segments in shuffle buffer Temporary objects (dataflow and programs) byproducts of records such as char[], byte[], String and ArrayList A WordCount mapper produces 480MB java.nio.heapcharbuffer objects with 64MB input split. Others Spill piece partition Other Mappers Other Reducers InMemSeg code segment, native libraries, call stack and etc. The Same Memory Buffer ShufBuf

How to do it? dataflowà objectsà usage in- memory objects dataflow volume configuration memory usage management mechanism generation GC effects summarized rules job history

How to do it? dataflowà objectsà usage Rules- statistics based approach Size of in- memory objects Old part TmpObjs Spill Buffer Mapper JVM New S0 S1 Eden Records TmpObjs Old part TmpObjs InMemSeg InMemSeg InMemSeg Reducer JVM New S0 S1 Eden Records TmpObjs Management mechanism GC effects RULE 1. MapOU f(conf) RULE 2. MapNGU f(xmx,records, TempObjs) RULE 3. RedNGU f(xms,xmx, Records, TempObjs) RULE rules of 4. RedOU f(xms, Xmx, Records, TempObjs) memory usage

How to do it? dataflowà objectsà usage User layer Data map() { } reduce() { } Pig /HiveQL Configura<on Memory Framework layer JVM Dataflow volume disk In- memory objects size Execution layer memory usage rules of memory usage

Evaluation Buffer size Max- min heap size Reducer number Split size relative error= emu rmu/rmu 100%

How about the related work? Execution time estimation ParaTimer, KAMD, ARIA and etc. Find optimum configurations Starfish can predict job s performance (mainly runtime) with different configurations and has a cost- based optimizer to find the optimum configuration. Find optimum GC policy for multicore MapReduce Garbage collection auto- tuning for java MapReduce on multi- cores, ISMM 2011

Conclusion and discussion Provide a detailed analysis of job s memory usage considering dataflow and memory management from user- level to JVM internals. Develop a fine- grained memory estimator predict job s memory usage in a large space of configurations. FMEM will be improved to do auto- configuration, auto- optimization and etc. JIRA MAPREDUCE- 4882 & MAPREDUCE- 4883

Thanks, Q&A