Skip to main content

Introduction

Before you Create, Read, Update or Delete data you need a database. In this tutorial you will learn how to create a database followed by importing data into collections.

There are two types of collections in ArangoDB:

  • Document Collections: These collections store JSON documents which can be accessed via key-value operations, document query, search or graph queries. Documents in these collections minimally need an _key and can have zero or more additional attributes.

  • Edge Collections: These collections store Edge documents in JSON format. They can be queried using all the operations that you can perform on a Document collection. Edge collections are however additionaly optimized for graph traversals. In addition to the _key attribute, edge documents require _from and _to attributes which point to the source and target vertex/document which comprise the two ends of the edge.

Data files

We are going to be working with the flight dataset for the purpose of the Query tutorial. The full data set can be downloaded here:

JSON Downloads

JSONL Downloads

Database Dump

  • Full Database Dump with flights Edge Collection, airport and points-of-interest Document Collections, and Search Views
note

The point-of-interest data is a limited US Only extract from the wikiVoyage database dumps.

The content is used and attributed to according to the terms of the Creative Commons Attribution-ShareAlike license.

Simply right-click the hyperlinks above and "Save As" the files to your local file system

Need to convert between JSON formats?

You can use the jquery tool jq to convert between formats

JSONL to JSON:

jq -c '.[]' input.json > output.jsonl.

JSON to JSONL:

jq -s '.' input.jsonl > output.json`

Understanding the Document Model

As with any NoSQL database, there are different ways the data can be modeled depending on the questions you intend to ask of the database. The performance characteristics of your database can be completely different, depending upon whether you choose to embed (de-normalize), reference (semi-normalize/normalize), or choose to use graph traversals for relationships.

In most cases, denormalization will get you the best performance at the price of increased data sizes, resource utilization, and transfer costs.

These labs, for simplicity, are going to use the flight's dataset to teach you the fundamentals of Query in ArangoDB

Document Model

Document Model

Sample Documents

In our dataset airports are modeled as Documents, while Flights are modeled as documents in between the edges representing a flight schedule.

Airports

{
"_key": "JFK",
"_id": "airports/JFK",
"_rev": "_YOO08KG-_T",
"name": "John F Kennedy Intl",
"city": "New York",
"state": "NY",
"country": "USA",
"lat": 40.63975111,
"long": -73.77892556,
"vip": true
}

Flights

{
"_key": "25471",
"_id": "flights/25471",
"_from": "airports/BIS",
"_to": "airports/MSP",
"_rev": "_YOO8JXG--f",
"Year": 2008,
"Month": 1,
"Day": 2,
"DayOfWeek": 3,
"DepTime": 1055,
"ArrTime": 1224,
"DepTimeUTC": "2008-01-02T16:55:00.000Z",
"ArrTimeUTC": "2008-01-02T18:24:00.000Z",
"UniqueCarrier": "9E",
"FlightNum": 5660,
"TailNum": "85069E",
"Distance": 386
}

ArangoDB Query UI

We will be using the ArangoDB query interface for the purpose of the query tutorials.

  • (1) The query interface can be reached by clicking QUERY in the left navigation
  • (2) The query text area is where you will type the query
  • (3) The query can be executed by clicking the Execute button or by pressing cmd + Enter (Mac) or ctrl + Enter (Linux/Windows)

"Query Interface"

Fundamentals

  • Arango uses some reserved field names. It’s common to set the _key , though the system can autogenerate the _key if you do not specify it.

    • _key: Unique document id for the document within a collection
    • _id: A unique id for the document within the database; its a concactenation of the collection name and key e.g. `flights/
    • _rev: A unique revision id for the document (used for compare and swap(CAS) / optimistic locking).
    • _from: Applicable only to edge collections, is the _id for the start vertex
    • _to: Applicable only to edge collections, is the _id for the end vertex
  • Document keys have to be unique, a string, of no more than 254 bytes and can contain these special characters:

    _-:.@()+,=;$!*'%

Examples

Document:

"Sample Document"

Edge:

"Sample Edge Document"

 
Help us improve

Anything unclear or buggy in this tutorial? Provide Feedback