The issue is visible to the selected user group only
This is pretty long, hope you still read it.
I'd like to see SubGit determine project-to-project copies automatically, at least for one-time conversions from Subversion to Git. This may or may not be computationally intensive, but it would definitely save people time trying to define what has happened in an aribitrary repository. SubGit seems very close to being able to do this. This may or may not be a problem when synchronizing across Subversion and Git if more than one path to each project was used.
The use-case for this would be if the trunk of a project was used to start another project with its own {trunk,branches,tags}. Note that this could happen multiple times.
Is it possible to stitch together history across arbitrary copies within the subversion repository? I believe that this information is there within the subversion repository (though it may be computationally expensive to determine) in the Copy-from, etc. data. Going backward in time seems doable with a single pass. Going forward seems like you would might to get all the data for the whole repository, then stitch it back together. It depends on the metadata and I am not an expert.
If it turned out to be possible to go both forward and backward in time given a project, and a user requested both project_a and project_b, they would get the same history with the repository and branches/tags named differently. Branches/tags could be named based on the absolute path within the repository.
The following example assumes that a project within a Subversion repository such as /1/2/3/4/p would result in a git repository at path /1/2/3/4/p.git. This accounts for having the same name in several places within the repository and uses the path to disambiguate instead of numbering the projects (p, p1, p2, etc.).
Another option would be to prefix all the branches and tags, though I don't think this is a great idea due to the extreme differences when a project-to-project copy has not occurred.
I'm not sure what this is doing. Is this pointing into the other translated git repository? If so, that is pretty interesting and might solve the problem, though if I understand correctly, this would currently require the user to modify subgit.conf after configuration. It is not done by SubGit automatically during configuration or install by SubGit. I've actually done something like this with my test conversions, though without the refs. It definitely gets all the {trunk,branches,tags} related to the conceptual project into a single git project, which is useful. I think using the original paths from the Subversion directory structure for the git project name will make the most sense to people. (e.g., [git "/1/2/3/4/project_a"]).
However, I don't think that SubGit will find the other projects automatically, which is a real problem for large, existing Subversion repositories. Detecting the copies from the repository itself, rather than having a user input all the relationships would be very useful and seems doable from my naive perspective. Cleaning up extra information from the git repositories caused by detected copies seems a much better problem than missing information regarding the true state of a project. Is this something that supporting multiple branches and tags layouts will address? It is not really a different layout, it's just properly following all copies from arbitrary paths within a project (or maybe just from the roots of the project in {trunk,branches,tags} to limit the complexity) into arbitrary paths (or project root paths) within the repository. Do you think that is theoretically possible?
# I'm not sure what this is doing. Is this pointing into the other translated git repository? If so, that is pretty interesting and might solve the problem
Let's have a look at a bit simplified configuration:
What happens on install and further synchronization: 1. SubGit detects project A, and fetches all the changes specified in corresponding section — so it converts changes of project_a/trunk into Git commits pointed by refs/heads/trunk, ..., changes of project_b/trunk into Git commits pointed by refs/heads/project_b/trunk. 2. Then SubGit finds project B, and fetches changes of project_b/trunk into Git commits pointed by refs/heads/trunk.
As you might notice, SubGit fetches certain changes twice. And that's known limitation of current version. From the other side after SubGit synchronized all the changes you actually get the history preserved if project A was copied into project B at certain moment.
though if I understand correctly, this would currently require the user to modify subgit.conf after configuration. It is not done by SubGit automatically during configuration or install by SubGit.
Correct, currently SubGit doesn't detect this kind of configuration automatically. From the other side subgit.conf file format is pretty simple, so you can write your own config generator which takes into account all the project-wide copies, etc.
I think using the original paths from the Subversion directory structure for the git project name will make the most sense to people. (e.g., [git "/1/2/3/4/project_a"]).
Our intention was to use something like `subgit configure –option git.<id>.repository="path/to/git/repo"`. Using "/1/2/3/4/project_a" as <id> here can cause some problems with the command line. Though we'll definitely consider the option to use such project identifier.
However, I don't think that SubGit will find the other projects automatically, which is a real problem for large, existing Subversion repositories. Detecting the copies from the repository itself, rather than having a user input all the relationships would be very useful and seems doable from my naive perspective. Cleaning up extra information from the git repositories caused by detected copies seems a much better problem than missing information regarding the true state of a project. Is this something that supporting multiple branches and tags layouts will address? It is not really a different layout, it's just properly following all copies from arbitrary paths within a project (or maybe just from the roots of the project in {trunk,branches,tags} to limit the complexity) into arbitrary paths (or project root paths) within the repository. Do you think that is theoretically possible?
That's definitely doable. We just need to figure out the better way to handle this feature. What I've described above is just an option how to approximately achieve what you're asking, though with some limitations.
A half of an year ago we had a discussion how to properly handle project-wide copies of branches and tags. I think the best option here is following:
1. User specifies default branches & tags mapping:
# Project B branches & tags branches = trunk:refs/heads/trunk branches = branches/*:refs/heads/* ... 2. `subgit install` converts existing svn repository into multiple git repositories, using the shared object database, e.g. it employes alternates or some similar mechanism. 3. Once SubGit detects "copy project_a/trunk@100 => project_b/trunk", it creates a new Git commit in repository project_b.git with the parent set to Git commit corresponding to project_a/trunk at revision 100.
This way, no matter how many copies have been done between any projects in Subversion repository, every Git repository tracks the history of corresponding branches. The only difference with the first approach is that user doesn't get those refs/heads/project_a/* refs in project_b.git repository, which might be rather welcome.
The beauty of such mechanics is that it works pretty well in opposite direction: consider a GitHub project u1/project, user u2 decides to create a fork of u1/project and creates u2/project_fork. What should correspond to that in Subversion repository? I think it's just "copy u1/project@HEAD => u2/project_fork" — all the history is copied and SubGit conversion still works well.
Summary:
With current version of SubGit you may achieve the behavior you're asking for, but with some limitations: a) you have to configure repository manually and b) multiple branches and tags layout is work in progress currently.
We have certain plans how to track project-wide copies better and automatically, but we have no time estimates on that right now. The important point here is that all the data structures we use currently do allow us to introduce the feature in latter releases.
Thanks for the very detailed and informative replies. I've been very impressed with SubGit and I'm even more encouraged for the future. Keep up the excellent work! I like the approach you have outlined for the future, it's more complete than I imagined.
I'll try to see if I can generate the configuration I need manually to solve my near term goals.
Some users installed that build into production environment and they have no issues with it so far, so we consider it stable. You may use this build to enable option #1 of what I've described above:
Regarding option #2, i.e. automatic detection of project-wide copies: unfortunately we have to postpone that, as we have a very tough roadmap towards SubGit 2.0, planned for the end of Q1/beginning of Q2, 2013. It is still hard to estimate when we're able to introduce this feature.
Thanks for the response and I understand. I have mostly been evaluating against our massive and complex repository, just observing behavior. I have done some experimentation with what you described and it looks pretty good and I feel I could work around this without an issue. My use case is probably conversion of our repository to git at some point, so finding all those branches manually is not insurmountable as it is not something that we'll do more than once.
I've been impressed with the overall behavior of the system. This is definitely the highest fidelity Subversion->Git translation I have evaluated.
Regarding the fidelity, we have some enhancements in this area already: our core library has support for arbitrary SVN properties translation. We're planning to introduce this in SubGit 2.0. If you're interested in this feature, please upvote SGT-325 issue.
Project A:
/1/2/3/4/project_a
/1/2/3/4/project_a/trunk
/1/2/3/4/project_a/tags/1.0
/1/2/3/4/project_a/tags/2.0
/1/2/3/4/project_a/branches/a1
/1/2/3/4/project_a/branches/a2
Project B:
/5/6/7/8/project_b
/5/6/7/8/project_b/trunk
/5/6/7/8/project_b/tags/1.0
/5/6/7/8/project_b/tags/2.0
/5/6/7/8/project_b/branches/b1
/5/6/7/8/project_b/branches/b2
You can configure SubGit as follows:
[git "A"]
translationRoot = /
path = /1/2/3/4/project_a.git
# Project A branches & tags
trunk = /1/2/3/4/project_a/trunk:refs/heads/trunk
branches = /1/2/3/4/project_a/branches/*:refs/heads/*
tags = /1/2/3/4/project_a/tags/*:refs/tags/*
# Project B branches & tags
branches = /5/6/7/8/project_b/trunk:refs/heads/5/6/7/8/project_b/trunk
branches = /5/6/7/8/project_b/branches/*:refs/heads/5/6/7/8/project_b/branches/*
tags = /5/6/7/8/project_b/tags/*:refs/tags/5/6/7/8/project_b/tags/*
[git "B"]
translationRoot = /
path = /5/6/7/8/project_b.git
# Project B branches & tags
branches = /5/6/7/8/project_b/trunk:refs/heads/trunk
branches = /5/6/7/8/project_b/branches/*:refs/heads/*
tags = /5/6/7/8/project_b/tags/*:refs/tags/*
# Project A branches & tags
branches = /1/2/3/4/project_a/trunk:refs/heads/1/2/3/4/project_a/trunk
branches = /1/2/3/4/project_a/branches/*:refs/heads/1/2/3/4/project_a/branches/*
tags = /1/2/3/4/project_a/tags/*:refs/tags/1/2/3/4/project_a/tags/*
Would that solve the problem?
Unfortunately SubGit currently does not support multiple branches and tags layout, which we're going to support soon after releasing SubGit 1.0.
However, I don't think that SubGit will find the other projects automatically, which is a real problem for large, existing Subversion repositories. Detecting the copies from the repository itself, rather than having a user input all the relationships would be very useful and seems doable from my naive perspective. Cleaning up extra information from the git repositories caused by detected copies seems a much better problem than missing information regarding the true state of a project. Is this something that supporting multiple branches and tags layouts will address? It is not really a different layout, it's just properly following all copies from arbitrary paths within a project (or maybe just from the roots of the project in {trunk,branches,tags} to limit the complexity) into arbitrary paths (or project root paths) within the repository. Do you think that is theoretically possible?
Thanks for reading :)
I'm not sure what this is doing. Is this pointing into the other translated git repository? If so, that is pretty interesting and might solve the problem
Let's have a look at a bit simplified configuration:
[git "A"]
translationRoot = /
path = project_a.git
# Project A branches & tags
trunk = project_a/trunk:refs/heads/trunk
branches = project_a/branches/*:refs/heads/*
...
# Project B branches & tags
branches = project_b/trunk:refs/heads/project_b/trunk
branches = project_b/branches/*:refs/heads/project_b/branches/*
...
[git "B"]
translationRoot = /
path = project_b.git
# Project B branches & tags
branches = project_b/trunk:refs/heads/trunk
branches = project_b/branches/*:refs/heads/*
...
# Project A branches & tags
branches = project_a/trunk:refs/heads/project_a/trunk
branches = project_a/branches/*:refs/heads/project_a/branches/*
...
What happens on install and further synchronization:
1. SubGit detects project A, and fetches all the changes specified in corresponding section — so it converts changes of project_a/trunk into Git commits pointed by refs/heads/trunk, ..., changes of project_b/trunk into Git commits pointed by refs/heads/project_b/trunk.
2. Then SubGit finds project B, and fetches changes of project_b/trunk into Git commits pointed by refs/heads/trunk.
As you might notice, SubGit fetches certain changes twice. And that's known limitation of current version. From the other side after SubGit synchronized all the changes you actually get the history preserved if project A was copied into project B at certain moment.
Correct, currently SubGit doesn't detect this kind of configuration automatically. From the other side subgit.conf file format is pretty simple, so you can write your own config generator which takes into account all the project-wide copies, etc.
Our intention was to use something like `subgit configure –option git.<id>.repository="path/to/git/repo"`. Using "/1/2/3/4/project_a" as <id> here can cause some problems with the command line. Though we'll definitely consider the option to use such project identifier.
That's definitely doable. We just need to figure out the better way to handle this feature. What I've described above is just an option how to approximately achieve what you're asking, though with some limitations.
A half of an year ago we had a discussion how to properly handle project-wide copies of branches and tags. I think the best option here is following:
1. User specifies default branches & tags mapping:
[git "A"]
translationRoot = /project_a
path = project_a.git
# Project A branches & tags
trunk = trunk:refs/heads/trunk
branches = branches/*:refs/heads/*
...
[git "B"]
translationRoot = /project_b
path = project_b.git
# Project B branches & tags
branches = trunk:refs/heads/trunk
branches = branches/*:refs/heads/*
...
2. `subgit install` converts existing svn repository into multiple git repositories, using the shared object database, e.g. it employes alternates or some similar mechanism.
3. Once SubGit detects "copy project_a/trunk@100 => project_b/trunk", it creates a new Git commit in repository project_b.git with the parent set to Git commit corresponding to project_a/trunk at revision 100.
This way, no matter how many copies have been done between any projects in Subversion repository, every Git repository tracks the history of corresponding branches. The only difference with the first approach is that user doesn't get those refs/heads/project_a/* refs in project_b.git repository, which might be rather welcome.
The beauty of such mechanics is that it works pretty well in opposite direction: consider a GitHub project u1/project, user u2 decides to create a fork of u1/project and creates u2/project_fork. What should correspond to that in Subversion repository? I think it's just "copy u1/project@HEAD => u2/project_fork" — all the history is copied and SubGit conversion still works well.
Summary:
I'll try to see if I can generate the configuration I need manually to solve my near term goals.
http://issues.tmatesoft.com/issue/SGT-434
Some users installed that build into production environment and they have no issues with it so far, so we consider it stable. You may use this build to enable option #1 of what I've described above:
[git "A"]
translationRoot = /
path = project_a.git
# Project A branches & tags
trunk = project_a/trunk:refs/heads/trunk
branches = project_a/branches/*:refs/heads/*
...
# Project B branches & tags
branches = project_b/trunk:refs/heads/project_b/trunk
branches = project_b/branches/*:refs/heads/project_b/branches/*
...
[git "B"]
translationRoot = /
path = project_b.git
# Project B branches & tags
branches = project_b/trunk:refs/heads/trunk
branches = project_b/branches/*:refs/heads/*
...
# Project A branches & tags
branches = project_a/trunk:refs/heads/project_a/trunk
branches = project_a/branches/*:refs/heads/project_a/branches/*
...
Regarding option #2, i.e. automatic detection of project-wide copies: unfortunately we have to postpone that, as we have a very tough roadmap towards SubGit 2.0, planned for the end of Q1/beginning of Q2, 2013. It is still hard to estimate when we're able to introduce this feature.
Hope current workaround helps you.
I've been impressed with the overall behavior of the system. This is definitely the highest fidelity Subversion->Git translation I have evaluated.
Regarding the fidelity, we have some enhancements in this area already: our core library has support for arbitrary SVN properties translation. We're planning to introduce this in SubGit 2.0. If you're interested in this feature, please upvote SGT-325 issue.