=======
= Csv =
=======

The Csv framework is used to import metrics or textual information and attach them to artefacts of type Application, File or Function. While parsing one or more input CSV files, if it finds the same metric for the same artefact several times, it will only use the last occurrence of the metric and ignore the previous ones. Note that the type of artefacts you can attach metrics to is limited to Application, File and Function artefacts. If you are working with File artefacts, you can let the Data Provider create the artefacts by itself if they do not exist already.

============
= form.xml =
============

You can customise form.xml to either:
- specify the path to a single CSV file to import
- specify a pattern to import all csv files matching this pattern in a directory

In order to import a single CSV file:
=====================================
<?xml version="1.0" encoding="UTF-8"?>
<tags baseName="Csv" needSources="true">
	<tag type="text" key="csv" defaultValue="/path/to/mydata.csv" />
</tags>

Notes: 
- The csv key is mandatory.
- Since Csv-based data providers commonly rely on artefacts created by Squan Sources, you can set the needSources attribute to force users to specify at least one repository connector when creating a project.

In order to import all files matching a pattern in a folder:
===========================================================
<?xml version="1.0" encoding="UTF-8"?>
<tags baseName="Csv" needSources="true">
	<!-- Root directory containing Csv files to import-->
	<tag type="text" key="dir" defaultValue="/path/to/mydata" />
	<!-- Pattern that needs to be matched by a file name in order to import it-->
	<tag type="text" key="ext" defaultValue="*.csv" />
	<!-- search for files in sub-folders -->
	<tag type="booleanChoice" defaultValue="true" key="sub" />
</tags>

Notes: 
- The dir and ext keys are mandatory
- The sub key is optional (and its value set to false if not specified)


==============
= config.tcl =
==============

Sample config.tcl file:
=======================
# The separator used in the input CSV file
# Usually \t or ;
set Separator "\t"

# The delimiter used in the input CSV file
# This is normally left empty, except when you know that some of the values in the CSV file
# contain the separator itself, for example:
# "A text containing ; the separator";no problem;end
# In this case, you need to set the delimiter to \" in order for the data provider to find 3 values instead of 4.
# To include the delimiter itself in a value, you need to escape it by duplicating it, for example:
# "A text containing "" the delimiter";no problemo;end
# Default: none
set Delimiter \"

# ArtefactLevel is one of:
#      Application: to import data at application level
#      File: to import data at file level. In this case ArtefactKey has to be set
#             to the value of the header (key) of the column containing the file path 
#			  in the input CSV file.
#      Function : to import data at function level, in this case:
#                 ArtefactKey has to be set to the value of the header (key) of the column containing the path of the file
#                 FunctionKey has to be set to the value of the header (key) of the column containing the name and signature of the function
# Note that the values are case-sensitive.
set ArtefactLevel File
set ArtefactKey File

# Should the File paths be case-insensitive?
# true or false (default)
# This is used when searching for a matching artefact in already-existing artefacts.
set PathsAreCaseInsensitive "false"

# Should file artefacts declared in the input CSV file be created automatically?
# true (default) or false 
set CreateMissingFile "true"

# FileOrganisation defines the layout of the input CSV file and is one of:
#     header::column: values are referenced from the column header
#     header::line: NOT AVAILABLE
#     alternate::line: lines are a sequence of {Key Value}
#     alternate::column: columns are a sequence of {Key Value}
# There are more examples of possible CSV layouts later in this document
set FileOrganisation header::column

# Metric2Key contains a case-sensitive list of paired metric IDs:
#     {MeasureID KeyName [Format]}
# where:
#   - MeasureID is the id of the measure as defined in your analysis model
#   - KeyName, depending on the FileOrganisation, is either the name of the column or the name 
#      in the cell preceding the value to import as found in the input CSV file
#   - Format is the optional format of the data, the only accepted format 
#      is "text" to attach textual information to an artefact, for normal metrics omit this field
set Metric2Key {
	{BRANCHES Branchs} 
	{VERSIONS Versions} 
	{CREATED Created} 
	{IDENTICAL Identical} 
	{ADDED Added} 
	{REMOV Removed} 
	{MODIF Modified}
	{COMMENT Comment text}
}


==========================
= Sample CSV Input Files =
==========================

Example 1:
==========
 FileOrganisation : header::column
 ArtefactLevel : File
 ArtefactKey   : Path

Path	Branchs	Versions
./foo.c	15		105
./bar.c	12		58

Example 2:
==========
 FileOrganisation : alternate::line
 ArtefactLevel : File
 ArtefactKey   : Path

Path	./foo.c	Branchs	15	Versions	105
Path	./bar.c	Branchs	12	Versions	58

Example 3:
==========
 FileOrganisation : header::column
 ArtefactLevel : Application

ChangeRequest	Corrected	Open
27				15			11

Example 4:
==========
 FileOrganisation : alternate::column
 ArtefactLevel : Application

ChangeRequest	15
Corrected		11

Example 5:
==========
 FileOrganisation : alternate::column
 ArtefactLevel : File
 ArtefactKey   : Path

Path	./foo.c
Branchs	15
Versions	105
Path	./bar.c
Branchs	12
Versions	58

Example 6:
==========
 FileOrganisation : header::column
 ArtefactLevel : Function
 ArtefactKey   : Path
 FunctionKey   : Name

Path	Name	Decisions Tested
./foo.c	end_game(int*,int*)	15		3
./bar.c	bar(char)	12		6

Working With Paths:
===================

- Path seperators are unified: you do not need to worry about handling differences between Windows and Linux
- With the option PathsAreCaseInsensitive, case is ignored when searching for files in the Squore internal data
- Paths known by Squore are relative paths starting at the root of what was specified in the repository connector durign the analysis. This relative path is the one used to match with a path in a csv file.

Here is a valid example of file matching:
  1. You provide C:\A\B\C\D as the root folder in a repository connector
  2. C:\A\B\C\D contains E\e.c then Squore will know E/e.c as a file

  3. You provide a csv file produced on linux and containing
    /tmp/X/Y/E/e.c as path, then Squore will be able to match it with the known file.

Squore uses the longest possible match.
In case of conflict, no file is found and a message is sent to the log.