softwareclonesorg

University of BremenSoftware Engineering Group

rcf Documentation

rcf is our data format to store and analyze code clone data. It defines an extendible schema for code clone data and provides a Java API that eases the development of analysis code. The main ideas behind rcf:

How rcf organizes clone data

Each rcf file is initialized with a predefined schema, which models common aspects of code clones. We call this the core schema. The following UML diagram shows a simplified version this schema.


(Click to enlarge)

Each entity has an id attribute which serves as a unique key. The clientId attributes can be used for own id information. A Version has a basepath. All directories contain their relative path to this basepath.

Extending the schema

The core schema can be extended as needed. Conceptionally rcf consists of three different elements:

Relations
These model the entites that can be stored in an rcf. CloneClass is one example for such a relation. You can think of a relation as a list of Entries.
Attributes
Relations have attributes, that can store values of a defined type. In the core schema type is an attribute of the relation CloneClass with the type int
.
Entries
An entry is one instance in an relation. While the relation CloneClass describes the concept of a clone class, an entry of that relation denotes one specific clone class. An entry defines a value for each attribute defined by the relaiton's schema.

In rcf it is possible to add arbitrary attributes and even relations. Attribute values can be of the primitive types int, float, boolean and String or reference another entry of any relation. An attribute can hold a scalar value or a list of values of the aforementioned types.

The schema is stored in the rcf file itself. This means that no schema definition or interfaces must be provided togehter with an rcf file that contains an extended schema.

Using the Java API

rcf can be accessed and modifed via our Java library (see downloads section). The library implements classes and access functions for all relations and attributes defined in the core schema. It also provides generic functions to access relations and attributes that were added by the user. See the API docs for a complete reference.

The following example shows how an existing rcf can be used to calculate the average token count of all fragments over all versions.

int avgFragmentSize(RCF rcf) {
  if (rcf.getFragments().size() == 0) return 0;
  int sum = 0;
  for (Version v : rcf.getVersions()) {
    for (CloneClass cc : v.getCloneClasses()) {
      for (Fragment f : cc.getFragments()) { 
        sum = sum + f.getNumTokens();
      }
    }
  }
  return sum / rcf.getFragments().size();
}

Each Relation in the core schema is represented by its own class in the API. The call to rcf.getVersions() returns an Object of the type Versions which represents the version relation . All relation objects are iterable. Using for(Version v: rcf.getVersions()) will iterate over all versions. Objects of the type Version represent one entry of the version relation, that is one concrete version.

Accessing the attributes is straightforward. For all attributes in the core schema get and set methods exist (e.g. f.getNumTokens()). It might not always be suiteable to set every attribute for every entry. Therefore attribute values can be unset. The attempt to access an unset attribute value will raise a ValueNotSetException which is a RuntimeException (which means that it does not have to be explicitly caught). If it is not clear if an attribute value is set for every entry it is possible to either catch the exception or to use a variant of the get method, which takes a default value. In our example: public int getNumTokens(int default). This will return the given default value if the numTokens attribute is not set.

Besides holding scalar values, attributes can also contain lists of values of the same type. The call of v.getCloneClasses() accesses the list attribute cloneclasses of the Version entry v. This returns a List<CloneClass>.

Loading a RCF file

To load a RCF file use the following code:

File file = new java.io.File("/path/to/rcf")
AbstractPersistenceManager apm = PersistenceManagerFactory.getPersistenceManager(file);
RCF rcf = pm.load(file);

The second line will select a suitable PersistenceManager to load the given file. The thrid line loads the file.

Adding & changing data

Most relations provide convenience methods to add new entries. You should use these, beacuse they will ensure that your data complies with the rcf conventions. The methods have the prefix add*. For instance, to add a fragment it is necessary to find out if its file and directory have been added to the rcf before, SourcePositions need to be created and linked from the fragment etc. All these things will be done automatically by calling Fragments.addFragment():

RCF rcf = //...
Version v = rcf.getVersions().getFirstEntry();
Fragments fragments = rcf.getFragments();
// add a fragment in "/path/to/file.c" 
// at lines 12:2-16:34 with 23 tokens to version v
Fragment f = fragments.addFragment("/path/to/file.c", v, 12, 16, 2, 34, 23);

Nevertheless you can add entries manually using the relation's append() method. To add a new entry to a relation, call the relation's append() method. This will return a newly created entry with all values unset. To set values call the set methods. The following example shows how a new Version is added manually.

RCF rcf = //...
Versions = rcf.getVersions();
Version v = Versions.append();
v.setClientId(1);
v.setBasepath("/a/b/c");

When a rcf model is not needed anymore, it must be explicitly closed using the close() method.

Extending the Schema using the API

The schema can be extended using the API functions. The following snippet shows an example how to add a new relation with attributes and how to use it.

RCF rcf = //...

//add a new relation
Relation<Entry> functions = rcf.addRelation("Function");

//add name attribute
Attribute attName = functions.addScalarAttribute(
                             "name",
                             AttributeType.STRING);

//add lenght attribute
Attribute attLenght = functions.addScalarAttribute(
                             "lenght",
                             AttributeType.INTEGER);

//add start and end attributes
SourcePositions sourcePositions = rcf.getSourcePositions();
Attribute attStart = functions.addReferenceAttribute(
                             "start", AttributeType.REFERENCE,
                             sourcePositions);
Attribute attEnd = functions.addReferenceAttribute(
                             "end", AttributeType.REFERENCE,
                             sourcePositions);

//add a function entry
Entry function = functions.append();
function.setString(attName, "myFunction");
function.setInt(attLength, 10);
function.setEntry(attStart, sourcePositions.append());
function.setEntry(attEnd, sourcePositions.append());

//set the values of the SourcePositions
//...

In general all relations, attributes and entries can be handled using the generic class types Relation<Entry>, Attribute and Entry. Accessing attribute values requires one to pass the attribute as an Attribute object or as the attribute's name. For relations and attributes that extend the core schema, these generic classes must be used.