Reliable RT processing @ Spotify



Like dokumenter
Information search for the research protocol in IIC/IID

Administrasjon av postnummersystemet i Norge Post code administration in Norway. Frode Wold, Norway Post Nordic Address Forum, Iceland 5-6.

Norsk (English below): Guide til anbefalt måte å printe gjennom plotter (Akropolis)

Prosjektet Digital kontaktinformasjon og fullmakter for virksomheter Digital contact information and mandates for entities

Steven Kleiman SVP & CTO Network Appliance, Inc.

Estimating Peer Similarity using. Yuval Shavitt, Ela Weinsberg, Udi Weinsberg Tel-Aviv University

FMEM: A Fine- grained Memory Estimator for MapReduce Jobs

HARP-Hybrid Ad Hoc Routing Protocol

Digital Transformasjon

Moving Objects. We need to move our objects in 3D space.

Server-Side Eclipse. Martin Lippert akquinet agile GmbH

Kundetilfredshetsundersøkelse FHI/SMAP

Trådløst nett UiT. Feilsøking. Wireless network UiT Problem solving

SiS GO ELECTROLYTE POWDERS

TJENESTEAVTALER FOR OFFENTLIG DOKUMENTASJONSFORVALTNING

Public roadmap for information management, governance and exchange SINTEF

Trådløsnett med Windows Vista. Wireless network with Windows Vista

En ny verden Og hvordan skal vi regulere den?

Server-Side Eclipse. Bernd Kolb Martin Lippert it-agile GmbH

Eiendomsverdi. The housing market Update September 2013

Hvor langt avbrudd kan man ha fra (DOT-)behandling?

Windows Server 2008 Hyper-V, Windows Server 2008 Server Core Installation Notes

The Akamai Network: A Platform for High-Performance Internet Applications. Erik Nygren Ramesh K. Sitaraman Jennifer Sun.

Trådløsnett med Windows XP. Wireless network with Windows XP

Elektronisk termostat med spareprogram. Lysende LCD display øverst på ovnen for enkel betjening.

Hybrid Cloud and Datacenter Monitoring with Operations Management Suite (OMS)

E-faktura. Brukergruppe Norge

UNIVERSITETET I OSLO ØKONOMISK INSTITUTT

Hvordan komme i gang med ArchiMate? Det første modelleringsspråket som gjør TOGAF Praktisk

Uke 5. Magnus Li INF /

Bestille trykk av doktoravhandling Ordering printing of PhD Thesis

FIRST LEGO League. Härnösand 2012

The regulation requires that everyone at NTNU shall have fire drills and fire prevention courses.

Sascha Schubert Product Manager Data Mining SAS International Copyright 2006, SAS Institute Inc. All rights reserved.

Confidence-based Data Management for Personal Area Sensor Nets

Software applications developed for the maritime service at the Danish Meteorological Institute

Gir vi de resterende 2 oppgavene til én prosess vil alle sitte å vente på de to potensielt tidskrevende prosessene.

Status Aker Verdal Mai 2010

Trigonometric Substitution

Bostøttesamling

GLOBALCOMSERVER HP 9100C DIGITAL SENDER GATEWAY ADMINISTRATOR S GUIDE 1998 AVM INFORMATIQUE (UPDATED: AUGUST 22, 2006)

VEIEN TIL ROM: HVORDAN JEG BLE KATOLIKK (NORWEGIAN EDITION)

Interventions in the Cerebral palsy follow-up program: Reidun Jahnsen, PT PhD

Baltic Sea Region CCS Forum. Nordic energy cooperation perspectives

Virginia Tech. John C. Duke, Jr. Engineering Science & Mechanics. John C. Duke, Jr.

Slope-Intercept Formula

Invitation to Tender FSP FLO-IKT /2013/001 MILS OS

Edge Of Dock Leveler. Size: Widths 66, 72, 78, 84. Lengths 15, 17 Lips. Capacities: 20,000 25,000 30,000 35,000 NEVERLIFT NL

Microsoft Dynamics C5 Version 2008 Oversigt over Microsoft Reporting Services rapporter

L esson Learned Bransjesamarbeid for bedre læring fra uønskede hendelser mai 2011, Clarion Airport Hotel Sola. Chul Aamodt, Mintra AS

Om Samba/fildeling. Hans Nordhaug Institutt for informatikk Høgskolen i Molde

Elektronisk innlevering/electronic solution for submission:

Hvor mye teoretisk kunnskap har du tilegnet deg på dette emnet? (1 = ingen, 5 = mye)

Dagens tema: Eksempel Klisjéer (mønstre) Tommelfingerregler

Avtale om Filtjenester Nettbank Bedrift

INSTALLATION GUIDE FTR Cargo Rack Regular Ford Transit 130" Wheelbase ( Aluminum )

Røde Kors Grunnkurs i Førstehjelp

SERVICE BULLETINE

Safety a t t h e f A c t o r y

Nettnøytralitet - regulering på jakt etter markedssvikt. Bjørn Hansen, Telenor Research Nettnøytralitetsforum 27. november 2014

kjell-aksel og iselin-genser kjell-aksel and iselin sweaters

Skog som biomasseressurs: skog modeller. Rasmus Astrup

Simulert tilbakekalling av makrell - produkter kjøpt i Japan

Smart High-Side Power Switch BTS730

Kurskategori 2: Læring og undervisning i et IKT-miljø. vår

buildingsmart Norge seminar Gardermoen 2. september 2010 IFD sett i sammenheng med BIM og varedata

DIGITAL SHAKE MARKEDSRÅD EDITION 03. Oktober 2017 Evgenia Egorova, markedsansvarlig

System integration testing. Forelesning Systems Testing UiB Høst 2011, Ina M. Espås,

Unit Relational Algebra 1 1. Relational Algebra 1. Unit 3.3

Arealplanlegging og arealstrategi

Horisont 2020 EUs forsknings- og innovasjonsprogram. Brussel, 6. oktober 2014 Yngve Foss, leder, Forskningsrådets Brusselkontor

Future Defined Datacenter

CAMES. Technical. Skills. Overskrift 27pt i to eller flere linjer teksten vokser opad. Brødtekst 22pt skrives her. Andet niveau.

of color printers at university); helps in learning GIS.

Trådløst nett UiT Feilsøking. Wireless network UiT Problem solving

THE MONTH THE DISCIPLINE OF PRESSING

Norway. Museum Statistics for Statistical data from 134 museums that were open to the public and had at least one man year regular staff.

// Translation // KLART SVAR «Free-Range Employees»

DeIC nationalt og internationalt samarbejde om escience og e-infrastruktur

Emneevaluering GEOV272 V17

Prime time for discovery (normalized)

Hvor mye teoretisk kunnskap har du tilegnet deg på dette emnet? (1 = ingen, 5 = mye)

MID-TERM EXAM TDT4258 MICROCONTROLLER SYSTEM DESIGN. Wednesday 3 th Mars Time:

Oppgave 1a Definer følgende begreper: Nøkkel, supernøkkel og funksjonell avhengighet.

STILLAS - STANDARD FORSLAG FRA SEF TIL NY STILLAS - STANDARD

verktøyskrin Grafisk profil ved Norges teknisk-naturvitenskapelige universitet

Trådløsnett med. Wireless network. MacOSX 10.5 Leopard. with MacOSX 10.5 Leopard

Western Alaska CDQ Program. State of Alaska Department of Community & Economic Development

INSTALLATION GUIDE FTR Cargo Rack Regular Ford Transit 130" Wheelbase ( Aluminum )

Høgskoleni østfold EKSAMEN. The examination set consists of three assignments. All assignments are to be answered.

Has OPEC done «whatever it takes»?

SRP s 4th Nordic Awards Methodology 2018

Business Process Monitoring hos Elkjøp

Familieeide selskaper - Kjennetegn - Styrker og utfordringer - Vekst og nyskapning i harmoni med tradisjoner

Bærekraftig FM til tiden/ Bærekraftig FM på tid

7 years as museum director at the Röhsska Museum, Göteborg. since February 2012 the museum director at the Sigtuna Museum, Sthlm

From Policy to personal Quality

LOGO. For the 25th anniversary, the Oslo City logo has been modernized. The campaign logo was: We celebrate...you!

Tilkobling og Triggere

Tema. Informasjonsarkitektur Brukervennlighet/Usability Kommunikasjon som treffer målrettet kommunikasjon

Transkript:

Reliable RT processing @ Spotify Pablo Barrera <pablo@spotify.com> February 5, 2014

Spotify

3 Spotify the right music for every moment over 6 million paying customers over 24 million active users each month over 20 million songs over 1.5 billion playlists created so far available in 55 markets

4 i/o tribe responsible for building the awesome infrastructure that supports the Spotify experience

Our goal this looks easy

7 That was easy MISSING FIGURE

but we have a problem... 8

9 Naïve approach (tm) SYSLOG FILE SCP LOG ARCHIVER CURL HDFS PROXY HADOOP

10 SCP CURL

10 SCP CURL

Scalability 11 SCP CURL

Scalability 11 for(;;) { } SCP (file) for(;;) { CURL(file) }

12

13 thousands of servers We have a several problem... data centres millions of users 10 TB each day

14 Our Needs reliable delivery fast data transfer per-service subscription low cpu overhead

15

16 Other options active mq/rabbit mq flume/flume-ng others: scribe, chukwa, bookkeeper

Apache Kafka distributed pub/sub system

18 Kafka coolness at least once read O(1) network bounded

19 Kafka architecture KAFKA BROKER TOPIC A KAFKA PRODUCER TOPIC B TOPIC C TOPIC D TOPIC E KAFKA CONSUMER

20 Cons no reliability no replication manual tuning

Spotify <3 Kafka running in production

22 Kafka at Spotify key component of our log delivery system kafka 0.7.1 java 7

23 Custom extensions end-to-end reliable delivery compression/encryption service

End-to-end reliable delivery

25 production server

25 production server KAFKA SYSLOG PRODUCER

25 production server KAFKA BROKER Service KAFKA SYSLOG PRODUCER

25 production server KAFKA BROKER Service KAFKA SYSLOG CONSUMER KAFKA SYSLOG PRODUCER

25 production server KAFKA BROKER Service KAFKA SYSLOG CONSUMER KAFKA SYSLOG PRODUCER HADOOP

25 production server KAFKA BROKER Service ACK KAFKA SYSLOG CONSUMER KAFKA SYSLOG PRODUCER HADOOP

25 production server KAFKA BROKER Service ACK Checkpoint KAFKA SYSLOG CONSUMER KAFKA SYSLOG PRODUCER HADOOP

is that all? 26

Piece of cake right?

28

29 Zookeeper Kafka Producer Kafka Broker Kafka Consumer Hadoop

29 Cross-site problems Zookeeper Kafka Producer Kafka Broker Kafka Consumer Hadoop

30 TCP window TCP parameters for big latency linux TCP scaling algorithm

31 IPSEC linux IPSEC + firewall is slow major drop in throughput can not tweak it at app level

32 production server KAFKA BROKER Service ACK Checkpoint KAFKA SYSLOG CONSUMER KAFKA SYSLOG PRODUCER HADOOP

32 production server KAFKA BROKER Service ACK Checkpoint KAFKA SYSLOG CONSUMER KAFKA SYSLOG PRODUCER HADOOP

32 production server KAFKA SYSLOG ENCRYPTION KAFKA BROKER Service ACK Checkpoint KAFKA SYSLOG CONSUMER KAFKA SYSLOG PRODUCER HADOOP

32 production server KAFKA SYSLOG ENCRYPTION KAFKA BROKER Compressed Service ACK Checkpoint KAFKA SYSLOG CONSUMER KAFKA SYSLOG PRODUCER HADOOP

32 production server KAFKA SYSLOG ENCRYPTION KAFKA BROKER Compressed Service ACK Checkpoint KAFKA SYSLOG CONSUMER KAFKA SYSLOG PRODUCER HADOOP

34 Garbage collector 50% of performance drop 25% of cpu time young generation tuning

35 100 % of time spent doing Full GC before tuning Time spent on Full GC (%) 80 60 40 20 0 0 2 4 6 8 10 12 14 Time (minutes)

36 100 % of time spent doing Full GC after tuning Time spent on Full GC (%) 80 60 40 20 0 0 200 400 600 800 1000 Time (minutes)

37 Hadoop replication factor stochastic failure mode no real ack from Hadoop files open for a long time

Apache Storm distributed computation framework

40 Storm abstractions: topology, bolt, stream, tuple, grouping great community ack + retries but not for reliable apps use Hadoop instead

41 Kafka integration reliable data for reporting low latency data for RT

42 production server KAFKA BROKER ACK Service Checkpoint KAFKA SYSLOG CONSUMER KAFKA SYSLOG PRODUCER

42 production server KAFKA BROKER ACK Service Checkpoint KAFKA SYSLOG CONSUMER KAFKA SYSLOG PRODUCER STORM

42 production server KAFKA BROKER Retries ACK Service Checkpoint KAFKA SYSLOG CONSUMER KAFKA SYSLOG PRODUCER STORM

RT apps

Body copy large

49 Storm

49 Storm

Thanks Pablo Barrera <pablo@spotify.com> Want to join the band? spotify.com/jobs February 5, 2014