Lifemapper Species Point Data Preparation

Species Point data formatted for Lifemapper is a compressed data package, named .tar.gz, containing two files. The data file is in CSV format and is named .csv and the metadata file is in JSON format and named .json.

The data file consists of a tab-delimited CSV file, with records grouped by taxa using a field designated with the "groupby" role in the metadata. Records MUST BE grouped in input data file. When Lifemapper processes these data, groups of records are written to separate files for Species Distribution Modeling and other analyses.

The metadata file consists of a JSON formatted object with name/value pairs for each column. The name of the column is either the column index (zero-based) or the name contained in the header row of the data file. The value is a JSON object with the following name/value pairs:

"name": Required. Name should be a unique value for each column, less than 10 characters.
"type": Required. The data type contained in the field. Accepted values = "int", "str", "float"
"role": Required only for fields fullfilling those roles. Required and optional roles listed below

The roles "groupby" and either "latitude" and "longitude" or "geopoint" are required. The role "taxaname" is useful if the "groupby" field is an identifier rather than human-readable name.

"groupby": Required. Indicates the field used to group records
"longitude": Required if no "geopoint" role. The longitude/x value
"latitude": Required if no "geopoint" role. The latitude/y value
"geopoint": Required if no "latitude" and "longitude" roles. The field contains both x and y coordinates; CSV data will be in the format {"lat": -16.35, "lon": -67.616667}
"uniqueid": Optional. The field contains a unique ID for each record. Values will be generated if missing.
"taxaname": Optional. The field contains the taxa name for the recordset. If this role is not present, records will be named with "groupby" field . If role is present and records in a group have different values, the first record be used as the dataset displayname.

Sample JSON metadata file contents:
{"0": {"name": "gbifid", "type": "int", "role": "uniqueid"},
"1": {"name": "occurid", "type": "int"},
"2": {"name": "taxonkey", "type": "int", "role": "groupby"},
"3": {"name": "datasetkey", "type": "str"},
"4": {"name": "puborgkey", "type": "str"},
"5": {"name": "basisofrec", "type": "str"},
"6": {"name": "kingdomkey", "type": "int"},
"7": {"name": "phylumkey", "type": "int"},
"8": {"name": "classkey", "type": "int"},
"9": {"name": "orderkey", "type": "int"},
"10": {"name": "familykey", "type": "int"},
"11": {"name": "genuskey", "type": "int"},
"12": {"name": "specieskey", "type": "int"},
"13": {"name": "sciname", "type": "str", "role": "taxaname"},
"14": {"name": "dec_lat", "type": "float", "role": "latitude"},
"15": {"name": "dec_long", "type": "float", "role": "longitude"},
"16": {"name": "day", "type": "int"},
"17": {"name": "month", "type": "int"},
"18": {"name": "year", "type": "int"},
"19": {"name": "rec_by", "type": "str"},
"20": {"name": "inst_code", "type": "str"},
"21": {"name": "coll_code", "type": "str"},
"22": {"name": "catnum", "type": "str"}
}