Paper notes and evaluation
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
Johannes Wünsche 5e7b1d08e8
notes: reformat hintstor
4 days ago
.gitignore add gitignore 1 month ago
MEASUREMENTS.md add sync note 6 days ago
README.md notes: reformat hintstor 4 days ago

README.md

Implementation related papers

Paper Summary and Survey

We try to divide all papers into three categories:

- Heuristics (smart selections)
- Machine Learning (anything todo with training)
- Hinting (possibly combined with one of the former)

Heuristics

Block-Level Data Migration in Tiered Storage System 1

  • eviction and prefetching of data
  • older paper from 2010; the "not focused" reasearch they talk about is now in focus anyway
  • upgrade from frequency based methods
  • metrics based approach with:
    • write/read frequency
    • write/read size
    • average write/read size
    • data placement
    • relevance between data blocks (defined by)
    • block valuation based on the aforementioned factors and 2 compensation coefficients
  • getting the formulas all translated could be quite a bit of work but no roadblock
  • no evaluation done in the paper

AutoTiering: Automatic data placement manager in multi-tier all-flash datacenter 2

  • focus on migration without HDDs, valid point
  • assumption of 10x speed difference between tiers breaks
  • observed case is the migration of virtual machine disks
  • tiers: NVMe SSD, TLC SSD for back storage
  • no duplicates should be hold due to space cost restriction
  • aim to avoid frequent migration of singular units
  • tested in simulations
  • metadata collected:
    • IO statistics
    • IO access patterns
    • latency tests results
  • other factors:
    • migration costs not negligible
    • future impact of optimal storage provision
  • due to speed of ssd not blocks are observed but complete files (in this case virtual machine disks)
  • the "Optimization Framework" seems well extractable and could be integrated into haura, maybe handle complete Objects like this?
  • Around the "Optimization Framework" another Approximation is done, as we would need to solve NP-hard problems for this (Set), their algorithm is at max n^2k

LRU and FIFO (Cache) 3

  • eviction policies
  • focus on very large scale systems
  • they find: FIFO sometimes has a better overall cost than LRU, even with a lower hitrate
  • main point: LRU is not always better than FIFO
  • cache is now much more than just main memory and can be persistent, NVM, SSD
  • overall good for LRU and FIFO references but not for our purpose of finding

hatS: A Heterogeneity-Aware Tiered Storage for Hadoop 4

  • Full migration
  • Extensive paper which includes much more about hadoop, we skip forward to their data placement and retrieval policies
  • hadoop: write once, read many; we differ from this a bit since we well support write many, read many
  • they also create replicas for improved throughput, the rest is exactly our conditions
  • file level granularity
  • files are duplicated in multiple tiers, also for data loss protection reasons
  • tier aware policies depending on large selection of tiers
  • each tier is assigned a weight and a weighted random functions determines the tier used for data retrieval
    • weights come from IOPS and capacity
  • they combine the tier-aware policy with another network-aware policy they created in the paper

RTHMS: a tool for data placement on hybrid memory system 5

  • memory mapping (DRAM and HBM)

Machine Learning

Data Prefetching for Large Tiered Storage Systems (IBM) 6

  • Focus: prefetching from tape or cloud storage to faster tiers
  • works without file access history/traces; some files in long storage may not have them
  • uses file metadata for access patterns; here thought of as arbitrary information one might save for this purpose
  • long time archival is more influenced by user behavior (less predicatable) than application behavior (more predictable)
  • active and passive metadata provision; they mention e.g. tagging context ids to blocks
  • other work mentioned works on metadata affinity (more similar ones more likely to be fetched)
  • access information is too slow with IO; as read operations are often followed shortly afterwards with more read requests shorter than the time required for a prefetch from cold storage
  • performed better than simple thresholds
  • they selected specific metadata fields related to their case study of radio telescope data; largely semantic

SLRL: A Simple Least Remaining Lifetime File Evicition policy for HPC multi-tier storage systems 7

Eviction!

  • Differentiation between prefetching and eviction policies
  • supposed to have at best 10% better hit ratio than LRU
  • they focus on eviction for this paper and all assumptions made in these notes are relevant to eviction policies
  • LRU not optimal for modern IO traces (especially named here object stores)
    • metadata management blamed for performance issues; hit-rate is still good but cost is increased (argued in IBM paper)
    • FIFO actually behaves better in this use case
    • for us this is maybe not that relevant as metadata is managed separately and does not need to follow this migration policy necessarily
  • they use the lifetimes from 8 and carry them in a queue to evict them in FIFO style
  • on low locality worse performing than FIFO or LRU
  • on high locality better FIFO(40%) and LRU(10%)
  • lifetime (time between creation and last access) predicted on creation
  • eviction happens on low storage space

Predicting file lifetimes for data placement in multi-tiered storage systems for HPC 9

  • eviction
  • CNN
  • access path depending, each segment is used as an indicator
  • two tier architecture, result is either archival or performance
  • though they open the possibility of introducing more such as 3dX optane storage so maybe also NVM
  • They build open an existing metadata engine which manages and fetches metadata from a separate database
  • main assumption is that the file creation is mostly followed by the last access to the file in scientific applications
  • much more data is written than is ever received
  • complex neural network construction, paths are simply used as pre-learning input
  • the same dataset was used for validation as was for training but randomly selected 30% the transferability of this model is not obvious

Predicting File Lifetimes with Machine Learning 8

  • eviction
  • Random Forest & CNN
  • very similar approach to 9, paths are used with neural networks
  • group and user of the file held as valuable here; though this is not the case for use

Block Popularity Prediction for Multimedia Storage Systems Using Spatial-Temporal-Sequential Neural Networks 10

  • STSNN
  • tested on traces build as the following: operation,logical block address,operation size,time
  • hot data blocks change rapidly and are somewhat uncertain in pattern prediction
  • complex behavior from just data blocks difficult (here we could work with hinting though, like 11)
  • correlation between blocks exist but are challenging to design a general neural network for

most of what we've seen yet in the paper research is the preparation of traces from a known application and see if that improves the workflow

  • they want to extend the limited neural network approach by considering a specific use case - multimedia
    • divide io behavior (frequency) into time blocks; here 100000 IOP
    • handle each block like a time series problem; as they seem to observe strong spatial and temporal correlations
    • predict next time step
  • data recorded and formatted to include spatial and temporal information (like for us object or dataset accordance and time)

This can abstracted by recording workloads like scientific and user based though, it should not outperform metadata based approaches in some cases.

  • large blocks used, 64M in one of the datasets, but also 4M blocks scenarios are tested
  • features they extract for learning:
    • frequency of read/write and in correlation to another
    • read size avg,min,max
    • alignment read (size = 2^n) and non-aligned reads, as well as their ratios
    • neighborhood frequency; previous 10 blocks and following 10 blocks are used as a vector with their read frequency
  • heavily focused later on read operations

maybe checkpoints break this assumption as they are mostly written and more seldom read if they are read at all

  • extensive description of the network architecture
  • multiple variations of the network are tested which observe quite different behavior in their evaluation

Automating Distributed Tiered Storage Management in Cluster Computing 12

Archivist: A Machine Learning Assisted Data Placement Mechanism for Hybrid Storage Systems 13

  • multiple Neural Networks

A reinforcement learning framework for online data migration in hierarchical storage systems 14

  • they argue that heuristically "claims are harder to justify that they improve the overall system behavior"
  • future proof optimizations to avoid frequent reshuffeling -> performance impacting
  • heavy impact on pre-emptive loading and migration or on next access, though this does not prevent full storage
  • they use the latter as mostly performance is in focus, could be interesting
  • no mentioning of which inputs are used, generalists method only mentioning that obv. read/write events are the carriers

Efficient Hierarchical Storage Management Empowered by Reinforcement Learning 15

Implementation Candidate

  • they use a similar build to our consisting of SSD, HDD, and some remote object store
  • hotness-scale of some float 0 to 1
  • file granularity (data points: hotness, size, type, use-case dependent properties)
  • LRU, LFU and size temperature replacement too static
  • this is supposed to allow for more dynamic access patterns
  • online approach
  • looks doable, they define their inputs clearly and argue that they can be changed
  • states and actions are defined

An Intelligent Data Placement Strategy for Hierarchical Storage Systems 16

Tiered Data Storage Model 17

Hinting

HFA: A Hint Frequency-based approach to enhance the I/O performance of multi-level cache storage systems 11

  • Largely framework design, may be get something else too
  • division into three kinds of hints - promotion - demotion - application

HintStor: A Framework to Study I/O Hints in Heterogeneous Storage 18

  • Concept of stream ID:

    • This largely adds multiple pattern access hints, it seems to bring a huge impact on performance, especially since this is also bound to certain storage tiers,
      • HDD: sequential read, sequential write
      • SSD: random read
      • SCM(NVM): random write
      • user facing classes:
        • read-intensive
        • write-intensive
        • backup/archival
      • though this is freely expandable, and we might want to give additional hints when working with stream id like interface
  • Data clasification

    • base approach to determine which files of unknowns or equal class to migrate/replicate
    • smaller files are preferred
    • this can be well abstracted into the object store interface
    • some additional messages are required for this though we have all the data
  • four base operations are designed for the base block-wise operations

    1. redirect - data was moved and the data is now at an external position (disk difference) not relevant to us
    2. migrate - heatmap of most used chunks are created and the most frequently used items are moved in a top-k scheme every 2 hours (time range can be modified, not very reactive)
    3. replicate - store multiple versions of a chunk, meant to decrease latency on very active drives
    4. prefetch - load to some kind of buffer e.g. DRAM, allows for partial transfers from disk to cache -> lower latency as they can be answered faster and do not require locking of chunks
  • I/O access hints

    • static: metadata, file size, file type, migration interval, chunk size
    • smaller files can be easier promoted, larger harder due to high amount of ineratia
    • shorter migration interval brings logically a lower average latency, they move 1000 files at once each interval
    • dynamic: file open, write/read file, stream ID, cloud prefetch, task priority (background and foregrund tasks)
    • priority could be interesting though probably for another day, as this can bring a lot more implementation effort
  • they achieve some good results in their scenarios, in which they used btrfs and ext4 as the modified file systems

  • Main Conclusions:

    • improvement can be made with data placement by evaluating static system hints as defined above, though several levels must be used, metadata, file size, types,...
    • streamIDs improve the caching efficiency, data which can be served well with lower tiers might not need to be moved upwards

Towards a QoS-aware Virtualised Storage System 19

General

Multi-Tier Buffer Management and Storage System Design for Non-Volatile Memory 20

Optimizing Hierarchical Storage Management For Database System 21

Data Jockey: Automatic Data Management for HPC Multi-tiered Storage Systems 22

  • bulk data movement and placement of datasets

    • similar to datasets as they are bound to scientific applications in HPC
  • division into control and data plane (similar to our daemon setup with control messages in a centralized service)

  • somewhat attached to job schedulers

  • they implement a complete resource manager and IO interface to assist the scientific workflow style, see also job schedulers

  • many automations step are meant to be included like automated archival for preservation

  • as well as interface to work with multiple cluster

  • works with multiple replications

  • "migration" is in this case simply done via a file system posix interface, or with a copy in geograhically distributed storage

    • the policy might still be interesting: Data Orchestration
  • "Data Orchestration":

    • data placement: users can specify a preferred target when writing
    • data movement (migration): tries to fit user choice, but with priority to decide on tight storage
    • resource management: replication and promotion goals are determined via graph algorithms where bandwidth between locations are considered
  • automation largely done for preconfigured requirements and distributed usage in clusters, ideas about burst buffers nice but nothing new, graph approach is interesting but not too usable for us in the current stage

Sorted out - non-applicable

Sorted out - unrelated


  1. https://doi.org/10.1109/ICCNT.2010.61

  2. https://doi.org/10.1109/PCCC.2017.8280433

  3. https://www.usenix.org/system/files/hotstorage20_paper_eytan.pdf

  4. https://doi.org/10.1109/CCGrid.2014.51

  5. https://doi.org/10.1145/3156685.3092273

  6. https://doi.org/10.1109/ICDM.2017.99

  7. https://doi.org/10.1145/3503646.3524297

  8. https://doi.org/10.1007/978-3-030-34356-9_23

  9. https://doi.org/10.1145/3439839.3458733

  10. https://doi.org/10.1145/3474085.3475495

  11. https://doi.org/10.1109/PADSW.2014.7097831

  12. https://doi.org/10.48550/arXiv.1907.02394 and https://doi.org/10.14778/3357377.3357381

  13. https://doi.org/10.1109/ICCD46524.2019.00098

  14. https://doi.org/10.1007/s11227-007-0135-3

  15. https://doi.org/10.1109/TKDE.2022.3176753

  16. https://doi.org/10.1109/ICCC51575.2020.9345165

  17. https://doi.org/10.1109/WECONF.2019.8840589

  18. https://doi.org/10.1145/3489143

  19. https://www.doc.ic.ac.uk/~wjk/publications/franciosi-knottenbelt-ukpew-2009.pdf

  20. https://doi.org/10.48550/arXiv.1901.10938

  21. https://uwspace.uwaterloo.ca/handle/10012/8531

  22. https://doi.org/10.1109/IPDPS.2019.00061