359x Filetype PPTX File size 0.13 MB Source: indico.cern.ch
HEP Data Format Activity
There is literally a flurry for some time
Focus on columnar formats (storage or conversion)
Dianna-HEP
Parquet, Awkward Array, Femtocode, OAMap, etc
iris-HEP
iDDS, Service-X, DOMA R&D
ROOT Project
RDataFrame
Others
COFFEA
HEP-Google TIM March 24-26, 2020 2
HEP Public Cloud Activity
Many projects leverage public clouds
HEPCloud (AWS)
HTCondor (AWS)
ICCEP GCPM Project (GCP)
Atlas Data Ocean Project (GCP)
Many other independent projects
HEP-Google TIM March 24-26, 2020 3
The Synthesis
Why not combine all these ideas
Analysis using a public cloud
E.G. Google Cloud Platform (GCP)
With a cloud storage friendly data format
E.G. Parquet (https://parquet.apache.org/)
Suitable for efficient memory representation
E.G. PANDAS (https://pandas.pydata.org/)
That Python oriented physicists find useful
We should learn quite a lot
HEP-Google TIM March 24-26, 2020 4
How We Got Here
August 2019
Informal discussion started (Andrew Hanushevsky & Ross Thomson)
September 2019
Project conceptualized
October 2019
Project formalized
On-boarded 20% Google engineer (Guilhem Tesseyre)
November 2019 onwards
Various approaches investigated and tried
February 2020
On-boarded physics analyst (Shawfeng Dong SLAC ACF)
HEP-Google TIM March 24-26, 2020 5
Project Goals I
Demonstrate efficient use of GCP for
physics analysis
We are only addressing analysis here
Using Python as the language
The demonstration has two aspects
Workflow for needed data flow setup
This usually requires data conversion
Workflow for running an analysis job
HEP-Google TIM March 24-26, 2020 6
no reviews yet
Please Login to review.