![]() We store a manifest in git of the "local" paths that are not in git but in cloud storage. Manifest option (dvc approach and datahub.io) In this approach i have to first copy that large dataset down, add to git (lfs) and push into my own cloud storage via git lfs. Cons: you are limited to one cloud storage and can't pull data from different places (and no central caching)įor example, suppose i have a project using an AWS public dataset.Use git-lfs and build a custom lfs server to store to arbitrary cloud storage. See also this matrix comparison Git LFS option It seems to me some hybrid could be achieved using extensions to Data Resource (to use remote URLs that have a local cache) and a special client that is aware of those extensions. In more ML use cases the ability to have multiple data sources from different systems could be valuable. My sense is that Git LFS with custom backend storage works fine for most CKAN use cases in which customer has their own storage. We could also add things like a dataflows.yml to a repo to make a data pipeline or a model.pkl file to store your machine learning analysis …Ĭontext: Project => Project Hub Approaches for storing large files and versioning themįor now I'll assume we use Git for versioning and we want large files outside of git.TODO: See Rufus' revisioning work at Data Protocols.(Old - last updated in 2018 and largely from before that) Collecting thoughts about data versioning.You shouldn't rely on it unless you're willing to take over development yourself." Now abandonware as makers of Noms, Attic Labs, were acquired by Salesforce in Jan 2018 and developed stopped at that point.Noms - Noms is a decentralized database philosophically descendant from the Git version control system.Diffing and patching tabular data - Paul Fitzpatrick - Aug 2013.We Need Distributed Revision/Version Control for Data -Rufus Pollock - July 2010.Git (and Github) for Data - Rufus Pollock - July 2013.What shall we use to create the Hub part of the DataHubįor now definitely CKAN Classic MetaStore What shall we use to create / manage git repos for us? GitLab University: Big files in Git (Annex, LFS and others). code, visualization, data processing, data analytics) Jump To godot and git part 8 git lfs and dealing with large assets preview 1 Jump To godot. ![]()
0 Comments
Leave a Reply. |