Welcome to datavoreclient’s documentation!

Datavoreclient is a python API usefull to get data from different services. With this API you can communicate with Datavore, Datalaps, Datahive and Sentinel_1.

Some functions of this API return an image. To display this image you can use an external library such as the python Image module. Eg :

import datavoreclient
from PIL import Image
from StringIO import StringIO
dc = datavoreclient.Datavore()
image = dc.getImage(...)
Image.open(StringIO(image)).show()

Or if you use Jupyter you can just use :

import datavoreclient
from IPython.display import Image, display
dc = datavoreclient.Datavore()
image = dc.getImage(...)
display(Image(image))

Some functions will also return you pandas series. You can then display them simply with the panda function plot(). Example :

data = dlaps.getTimeserie(index=‘2000’, varname=’SULHF’, coord=’-27.42,19.97’, datelaps=‘19000101010101,21000101010101’, groupby=None)
data.plot()

more information at http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.plot.html

Datavore

class datavore.Datavore(service=None)

Class to communicate with Datavore. Datavore is a visualization tool that is able to look for data from different satellites and display the information as a map with customization options.

getImage(index, varname, date, bbox='global', vMin='default', vMax='default', cmap='default', nearest=None, offset='default', resolution='default', filters={}, paramsPlot='default')

Returns an image generated from Datavore. Works the same as Datavore on web, except you give parameters manually.

Parameters:
  • index (str) – Name of the index to get data from.
  • varname (str) – Name of the variable of the index to get data from.
  • date (str) –

    formatted string representing a date.

    format is : YYYYMMDDhhmmss. Eg : 20120815125959.

  • bbox ([float] or string) –

    Corresponds to a rectangle of coordinates

    Array : [x1, y1, x2, y2] or string : “x1, y1, x2, y2”.

  • vMin (int) – Minimum value of the scale.
  • vMax (int) – Maximum value of the scale.
  • cmap (str) –

    Change the color of the map.

    Available styles : jet, bwr, viridis, inferno, plasma, magma, Blues, bone, cool, autumn, s3pcpn

  • nearest (int) – If there is no data at the exact given date, nearest is how far in seconds datavore will look for data around the given date
  • offset (int) – Offset the value of the data by this amount
  • resolution (int) – Resolution of the image returned
  • filters – Probably useless
  • paramsPlot (dictionary or str) –

    Add some options for image rendering.

    Eg : {“projection”:”pc_world”,”radius”:None,”plot_method”:”pcolormesh”,”resample”:True}

Returns:

An image representing a map generated by Datavore.

Return type:

image

Example of use :
getImage(‘ISAS13>v3’,’PSAL’,‘20120815000000’,bbox=[-60,3,-22,27])
will return the data from the value PSAL of the index ISAS13>v3 at coordinates [-60,3,-22,27] at the 15th august of 2012
getData(index, varname, date, bbox='global', vMin='default', vMax='default', cmap='default', nearest=None, offset='default', resolution='default', filters={}, paramsPlot='default', liste=False)

Returns data generated from Datavore. Works the same as getImage (see above), except it returns data instead of a map.

getImageFromUrl(url)

Return an image from a url of datavore. Just copy-paste the url of the Datavore web page after you generated your data map, and it should return you the same image.

Parameters:url (str) – Url of the Datavore web page.
Returns:An image representing a map generated by Datavore.
Return type:image
getFileList(index, varname, bbox='global', startdate='19500917', stopdate='21000917', dl=0, inv='null', filters='null')

Returns a list of the files used to search for data with given parameters.

Parameters:
  • index (str) – Name of the index to get data from.
  • varname (str) – Name of the variable of the index to get data from.
  • startdate (str) –

    formatted string representing a date.

    format is : YYYYMMDDhhmmss. Eg : 20120815125959.

  • stopdate (str) – Same as startDate.
  • bbox ([float] or string) –

    Corresponds to a rectangle of coordinates

    Array : [x1, y1, x2, y2] or String : “x1, y1, x2, y2”.

  • filters – Probably useless
Returns:

A list of the files used to search for data with given parameters.

Return type:

array

getIndexes()
Returns:The indexes available with their configuration.
Return type:dictionary
getListIndexes()
Returns:List of indexes available.
Return type:array
printListIndexes()

Print list of indexes.

DatavoreSentinel1

class datavore.Datavoresentinel1(service=None)

Class to get data from the sentinel 1 satellite.

getImage(index, varname, date, bbox='global')

Generate and return a map from sentinel 1.

Parameters:
  • index (str) – Name of the index to get data from. For sentinel 1, it is either “S1AWV.v3<” or “S1BWV.v3<”.
  • varname (str) –

    Name of the variable of the index to get data from. Possible values :

    l1_roughness,

    S1A_WV_OCN__2S_realcrossspectra_s1a,

    S1B_WV_OCN__2S_realcrossspectra_s1b,

    S1A_WV_OCN__2S_imaginarycrossspectra_s1a,

    S1B_WV_OCN__2S_imaginarycrossspectra_s1b,

    S1A_WV_OCN__2S_ocean_swell_spectra_s1a,

    S1B_WV_OCN__2S_ocean_swell_spectra_s1b,

    S1A_WV_OCN__2S_ww3,

    S1B_WV_OCN__2S_ww3 (not working).

    Has to match the S1A or S1B from the index.

  • date (str) – formatted string representing a date. format is : YYYYMMDDhhmmss. Eg : 20120815125959.
  • bbox ([float] or string) – Corresponds to a rectangle of coordinates like so : [x1, y1, x2, y2] or like so : “x1, y1, x2, y2”.
Returns:

An image representing a map generated by Sentinel 1.

Return type:

image

getFileList(bbox='global', startdate='19500917', stopdate='21000917', dl=0, inv='null', filters='null')

Returns a list of the files used to search for data with parameters given.

Parameters:
  • bbox ([float] or string) –

    Corresponds to a rectangle of coordinates

    Array : [x1, y1, x2, y2] or String : “x1, y1, x2, y2”.

  • startdate (str) –

    formatted string representing a date.

    format is : YYYYMMDDhhmmss. Eg : 20120815125959.

  • stopdate (str) – Same as startDate.
  • filters – Probably useless
Returns:

A list of the files used to search for data with given parameters.

Return type:

array

Datalaps

class datavore.Datalaps(datavoreService=None, datalapsService=None)

Just like Datavore, Datalaps is a visualization tool. It is able to look for data from different satellites and display the information as a map. The main difference are that it is able to look for data between a time lapse and aggregate the information by a scale of time.

getImage(index, varname, bbox, datelaps='19500917, 21000917', aggregate='avg')

Return an image generated by Datalaps.

CARE large areas will be very slow to load, please use datavore to display large maps.

Parameters:
  • index (str) – Name of the index to get data from.
  • varname (str) – Name of the variable of the index to get data from.
  • bbox (string or [float]) –

    Corresponds to a rectangle of coordinates

    Array : [x1, y1, x2, y2] or String : “x1, y1, x2, y2”.

  • datelaps (str or [int] or [datetime]) –

    formatted string representing a lapse of time.

    • format is : YYYYMMDDhhmmss. Eg :

    ‘19500917,21000917’ (default).

    • It can also be an array of int Eg :

    [19500917,21000917].

    • It can also be an array of datetime Eg :

    [datetime.date(1900, 12, 5),datetime.date(2100, 12, 5)].

  • aggregate (str) –

    String representing the function to use to aggregate the data of the time lapse. The aggregated data will be used to generate the map. Possible values :

    • ‘avg’ : average (default)
    • ‘max’ : maximum
    • ‘min’ : minimum
    • ‘count’ : number of result
    • ‘std’ : no se
Returns:

An image representing the map of the data aggregated.

Return type:

image

getTimeserie(index, varname, coord, datelaps='19500917, 21000917', groupby=None, aggregate='avg')

Exactly the same as getTimeserieRaw() but return time series as pandas.

Parameters:
  • index (str) – Name of the index to get data from.
  • varname (str) – Name of the variable of the index to get data from.
  • coord (string or [float]) –

    Corresponds to a position OR a rectangle of coordinates

    If a rectangle is passed, the data of the rectangle will be aggregated by the aggregate function given in parameter.

    Position : Array : [x1, y1] or String : “x1, y1”.

    Rectangle : Array : [x1, y1, x2, y2] or String : “x1, y1, x2, y2”.

  • datelaps (str or [int] or [datetime]) –

    formatted string representing a lapse of time.

    • format is : YYYYMMDDhhmmss. Eg :

    ‘19500917,21000917’ (default).

    • It can also be an array of int Eg :

    [19500917,21000917].

    • It can also be an array of datetime Eg :

    [datetime.date(1900, 12, 5),datetime.date(2100, 12, 5)].

  • groupby (str) –

    Group by a period of time. Possible values :

    • ‘yyyy’ : group by year
    • ‘yyyy-MM’ : group by month
    • ‘yyyy-MM-dd’ : group by day
    • no value : all results
  • aggregate (str) –

    String representing the function to use to aggregate the data of the group by. Possible values :

    • ‘avg’ : average (default)
    • ‘max’ : maximum
    • ‘min’ : minimum
    • ‘count’ : number of result
    • ‘std’ : no se
Returns:

[timestamp, value] tuples

Return type:

pandas

You can use plot() to display the data in a graph.

mergeTimeserie(pandas)

Merge pandas dataframe into one dataframe It just uses the pandas function join. See http://pandas.pydata.org/pandas-docs/stable/merging.html

Parameters:pandas ([pandas]) – List of pandas dataframe to merge.
Returns:a single merged dataframe.
Return type:pandas
getMergedTimeserie(indexTuples, coord, datelaps='19500917, 21000917', groupby=None, aggregate='avg')

For each index/varname tuple in indexTuples, this function will call getTimeserie() with the given parameters and merge the results into the same pandas dataframe.

Parameters:
  • indexTuples ([[string,string]]) – list of tuples [index, varname]
  • otherArguments – see getTimeserie()
Returns:

a single merged dataframe.

Return type:

pandas

getTimeserieRaw(index, varname, coord, datelaps='19500917, 21000917', groupby=None, aggregate='avg')

Return time series of a data lapse at a position or in a rectangle of coordinates. This is the same as getTimeserie() but it returns the raw data (wich is a list of tuples) not transformed into pandas.

Parameters:
  • index (str) – Name of the index to get data from.
  • varname (str) – Name of the variable of the index to get data from.
  • coord (string or [float]) –

    Corresponds to a position OR a rectangle of coordinates

    If a rectangle is passed, the data of the rectangle will be aggregated by the aggregate function given in parameter.

    Position : Array : [x1, y1] or String : “x1, y1”.

    Rectangle : Array : [x1, y1, x2, y2] or String : “x1, y1, x2, y2”.

  • datelaps (str or [int] or [datetime]) –

    formatted string representing a lapse of time.

    • format is : YYYYMMDDhhmmss. Eg :

    ‘19500917,21000917’ (default).

    • It can also be an array of int Eg :

    [19500917,21000917].

    • It can also be an array of datetime Eg :

    [datetime.date(1900, 12, 5),datetime.date(2100, 12, 5)].

  • groupby (str) –

    Group by a period of time. Possible values :

    • ‘yyyy’ : group by year
    • ‘yyyy-MM’ : group by month
    • ‘yyyy-MM-dd’ : group by day
    • no value : all results
  • aggregate (str) –

    String representing the function to use to aggregate the data of the group by. Possible values :

    • ‘avg’ : average (default)
    • ‘max’ : maximum
    • ‘min’ : minimum
    • ‘count’ : number of result
    • ‘std’ : no se
Returns:

list of [timestamp, value] tuples

Return type:

array

getUrl(urlFormat, index, varname, coord, datelaps='19500917, 21000917', groupby=None, aggregate='avg')

Create and return a url to get the information from Datalaps with the urlformat given in parameters.

DataHive

class datavore.Datahive

Hive est une infrastructure d’entrepot de donnee integree sur Hadoop permettant l’analyse, le requetage via un langage proche syntaxiquement de SQL ainsi que la synthese de donnees.

executeQueryPYHS2(query, host='br156-165', port=10000, authMechanism='PLAIN', user='root', password='test', database='default')

execute

NoteFred: fonctionne treeeeees lentement pour la recuperation des resultats ... 1000 records/s ==> des heures d’attente pour des millions de records... Passage a impyla :param query: :type query: string

executeQuery(query, host='br156-165', port=10000, auth_mechanism='PLAIN', user='root', password='test', database='default', output_pandas=True, queue_name='hive1')