Title: | Query 'nycflights13'-Like Air Travel Data for Given Years and Airports |
---|---|
Description: | Supplies a set of functions to query air travel data for user- specified years and airports. Datasets include on-time flights, airlines, airports, planes, and weather. |
Authors: | Simon P. Couch [aut, cre], Hadley Wickham [ctb], Jay Lee [ctb], Dennis Irorere [ctb] |
Maintainer: | Simon P. Couch <[email protected]> |
License: | CC0 |
Version: | 0.3.4.9000 |
Built: | 2024-11-04 03:33:25 UTC |
Source: | https://github.com/simonpcouch/anyflights |
This function generates a list of dataframes similar to those found in the
nycflights13
data package for any US airports
and time frames. Please note that, even with a strong internet connection,
this function may take several minutes to download relevant data.
anyflights(station, year, month = 1:12, dir = NULL)
anyflights(station, year, month = 1:12, dir = NULL)
station |
A character vector giving the origin US airports of interest (as the FAA LID airport code). |
year |
A numeric giving the year of interest. This argument is currently not vectorized, as dataset sizes for single years are significantly large. Information for the most recent year is usually available by February or March in the following year. |
month |
A numeric giving the month(s) of interest. |
dir |
An optional character string giving the directory to save datasets in. By default, datasets will not be saved to file. |
The anyflights()
function is a wrapper around the following functions:
get_airlines
: Grab data to translate between two letter
carrier codes and names
get_airports
: Grab data on airport names and locations
get_flights
: Grab data on all flights that departed
given US airports in a given year and month
get_planes
: Grab construction information about each
plane
get_weather
: Grab hourly meterological data for a given
airport in a given year and month
The recommended approach to download data for many stations (airports)
is to supply a vector of stations to the station
argument rather than
iterating over many calls to anyflights()
. The faa
column
in dataframes outputted by get_airports()
provides the FAA LID
codes for all supported airports. See
?get_flights
for more details on implementation.
A list of dataframes (and, optionally, a directory of datasets)
similar to those found in the nycflights13
data package.
get_flights
for flight data,
get_weather
for weather data,
get_airlines
for airlines data,
get_airports
for airports data,
or get_planes
for planes data.
Use the as_flights_package
function to convert the output
of this function to a data-only package.
# grab data on all flights departing from # Portland International Airport in June 2019 and # other useful metadata without saving to file ## Not run: anyflights("PDX", 2018, 6) # ...or, grab that same data and opt to save the # file as well! (tempdir() can usually be specified # as a character string giving the path to a folder) ## Not run: anyflights("PDX", 2018, 6, tempdir())
# grab data on all flights departing from # Portland International Airport in June 2019 and # other useful metadata without saving to file ## Not run: anyflights("PDX", 2018, 6) # ...or, grab that same data and opt to save the # file as well! (tempdir() can usually be specified # as a character string giving the path to a folder) ## Not run: anyflights("PDX", 2018, 6, tempdir())
The anyflights package supplies a set of functions to generate
nycflights13
-like datasets and data packages for specified years and
airports.
Maintainer: Simon P. Couch [email protected]
Other contributors:
Hadley Wickham [email protected] [contributor]
Jay Lee [email protected] [contributor]
Dennis Irorere [email protected] [contributor]
Useful links:
Report bugs at https://github.com/simonpcouch/anyflights/issues
Generate a data-only package, including documentation, from data outputted by the 'anyflights()' function. Please do not submit the outputted package to CRAN or similar repositories as original packages.
as_flights_package(data, name = make.names(deparse(substitute(data))))
as_flights_package(data, name = make.names(deparse(substitute(data))))
data |
A named list of dataframes outputted by
|
name |
The desired name of the resulting package as a character string.
The package will check that the supplied package name is valid using the
regular expression |
A directory containing a data-only package built around the supplied data.
This function generates a dataframe similar to the
airlines
dataset from nycflights13
for any US airports and time frame. Please
note that, even with a strong internet connection, this function
may take several minutes to download relevant data.
get_airlines(dir = NULL, flights_data = NULL)
get_airlines(dir = NULL, flights_data = NULL)
dir |
An optional character string giving the directory to save datasets in. By default, datasets will not be saved to file. |
flights_data |
Optional—either a filepath as a
character string or a dataframe outputted by |
A data frame with <2k rows and 2 variables:
Two or three length letter or number abbreviation. In cases whgere the the Unique Carrier Code has been use more than once, a suffix is added. ex. ML, ML (1). This list matches the 'Reporting_Airline' field in the BTS documentation for the flights data set
Full name
get_flights
for flight data,
get_weather
for weather data,
get_airports
for airports data,
get_planes
for planes data,
or anyflights
for a wrapper function.
Use the as_flights_package
function to convert this dataset
to a data-only package.
# run with defaults ## Not run: get_airlines() # if you'd like to only return the airline # abbreviations only for airlines that appear in # \code{flights}, query your flights dataset first, # and then supply it as a flights_data argument ## Not run: get_airlines(flights_data = get_flights("PDX", 2018, 6))
# run with defaults ## Not run: get_airlines() # if you'd like to only return the airline # abbreviations only for airlines that appear in # \code{flights}, query your flights dataset first, # and then supply it as a flights_data argument ## Not run: get_airlines(flights_data = get_flights("PDX", 2018, 6))
This function generates a dataframe similar to the
airports
dataset from nycflights13
for any US airports and time frame. Please
note that, even with a strong internet connection, this function
may take several minutes to download relevant data.
get_airports(dir = NULL)
get_airports(dir = NULL)
dir |
An optional character string giving the directory to save datasets in. By default, datasets will not be saved to file. |
A data frame with ~1350 rows and 8 variables:
FAA airport code
Usual name of the airport
Location of airport
Altitude, in feet
Timezone offset from GMT/UTC
Daylight savings time zone. A = Standard US DST: starts on the second Sunday of March, ends on the first Sunday of November. U = unknown. N = no dst.
IANA time zone, as determined by GeoNames webservice
https://openflights.org/data.html
get_flights
for flight data,
get_weather
for weather data,
get_airlines
for airlines data,
get_planes
for planes data,
or anyflights
for a wrapper function.
Use the as_flights_package
function to convert this dataset
to a data-only package.
# grab airports data ## Not run: get_airports()
# grab airports data ## Not run: get_airports()
This function generates a dataframe similar to the
flights
dataset from nycflights13
for any US airport and time frame. Please
note that, even with a strong internet connection, this function
may take several minutes to download relevant data.
get_flights(station, year, month = 1:12, dir = NULL, ...)
get_flights(station, year, month = 1:12, dir = NULL, ...)
station |
A character vector giving the origin US airports of interest (as the FAA LID airport code). |
year |
A numeric giving the year of interest. This argument is currently not vectorized, as dataset sizes for single years are significantly large. Information for the most recent year is usually available by February or March in the following year. |
month |
A numeric giving the month(s) of interest. |
dir |
An optional character string giving the directory to save datasets in. By default, datasets will not be saved to file. |
... |
Currently only used internally. |
This function currently downloads data for all stations for each month
supplied, and then filters out data for relevant stations. Thus,
the recommended approach to download data for many airports is to supply
a vector of airport codes to the station
argument rather than
iterating over many calls to get_flights()
.
A data frame with ~1k-500k rows and 19 variables:
year, month, day
Date of departure
dep_time, arr_time
Actual departure and arrival times, UTC.
sched_dep_time, sched_arr_time
Scheduled departure and arrival times, UTC.
dep_delay, arr_delay
Departure and arrival delays, in minutes. Negative times represent early departures/arrivals.
hour, minute
Time of scheduled departure broken into hour and minutes.
carrier
Two letter carrier abbreviation. See
get_airlines
to get full name
tailnum
Plane tail number
flight
Flight number
origin, dest
Origin and destination. See
get_airports
for additional metadata.
air_time
Amount of time spent in the air, in minutes
distance
Distance between airports, in miles
time_hour
Scheduled date and hour of the flight as a
POSIXct
date. Along with origin
, can be used to join
flights data to weather data.
If you are repeatedly getting a timeout error when downloading flights,
this could be because your download is taking longer than the default timeout
R option. You can change the timeout value for your R session by running the
code options(timeout = timeout_value_in_seconds)
in your console.
RITA, Bureau of transportation statistics, https://www.bts.gov
get_weather
for weather data,
get_airlines
for airlines data,
get_airports
for airports data,
get_planes
for planes data,
or anyflights
for a wrapper function.
Use the as_flights_package
function to convert this dataset
to a data-only package.
# flights out of Portland International in June 2018 ## Not run: get_flights("PDX", 2018, 6) # ...or the original nycflights13 flights dataset ## Not run: get_flights(c("JFK", "LGA", "EWR"), 2013) # use the dir argument to indicate the folder to # save the data in \code{dir} as "flights.rda" ## Not run: get_flights("PDX", 2018, 6, dir = tempdir())
# flights out of Portland International in June 2018 ## Not run: get_flights("PDX", 2018, 6) # ...or the original nycflights13 flights dataset ## Not run: get_flights(c("JFK", "LGA", "EWR"), 2013) # use the dir argument to indicate the folder to # save the data in \code{dir} as "flights.rda" ## Not run: get_flights("PDX", 2018, 6, dir = tempdir())
This function generates a dataframe similar to the
planes
dataset from nycflights13
for any US airports and time frame. Please
note that, even with a strong internet connection, this function
may take several minutes to download relevant data.
get_planes(year, dir = NULL, flights_data = NULL)
get_planes(year, dir = NULL, flights_data = NULL)
year |
A numeric giving the year of interest. This argument is currently not vectorized, as dataset sizes for single years are significantly large. Information for the most recent year is usually available by February or March in the following year. |
dir |
An optional character string giving the directory to save datasets in. By default, datasets will not be saved to file. |
flights_data |
Optional—either a filepath as a
character string or a dataframe outputted by |
A data frame with ~3500 rows and 9 variables:
Tail number
Year manufactured
Type of plane
Manufacturer and model
Number of engines and seats
Average cruising speed in mph
Type of engine
FAA Aircraft registry, https://www.faa.gov/licenses_certificates/aircraft_certification/aircraft_registry/releasable_aircraft_download
get_flights
for flight data,
get_weather
for weather data,
get_airlines
for airlines data,
get_airports
for airports data,
or anyflights
for a wrapper function.
Use the as_flights_package
function to convert this dataset
to a data-only package.
# grab airplanes data for 2018 ## Not run: get_planes(2018) # if you'd like to only return the planes that appear # in \code{flights}, query your flights dataset first, # and then supply it as a \code{flights_data} argument ## Not run: get_planes(2018, flights_data = get_flights("PDX", 2018, 6)) ## End(Not run)
# grab airplanes data for 2018 ## Not run: get_planes(2018) # if you'd like to only return the planes that appear # in \code{flights}, query your flights dataset first, # and then supply it as a \code{flights_data} argument ## Not run: get_planes(2018, flights_data = get_flights("PDX", 2018, 6)) ## End(Not run)
This function generates a dataframe similar to the
weather
dataset from nycflights13
for any US airports and time frame. Please
note that, even with a strong internet connection, this function
may take several minutes to download relevant data.
get_weather(station, year, month = 1:12, dir = NULL)
get_weather(station, year, month = 1:12, dir = NULL)
station |
A character vector giving the origin US airports of interest (as the FAA LID airport code). |
year |
A numeric giving the year of interest. This argument is currently not vectorized, as dataset sizes for single years are significantly large. Information for the most recent year is usually available by February or March in the following year. |
month |
A numeric giving the month(s) of interest. |
dir |
An optional character string giving the directory to save datasets in. By default, datasets will not be saved to file. |
A data frame with ~1k-25k rows and 15 variables:
origin
Weather station. Named origin
to facilitate
merging with flights data
year, month, day, hour
Time of recording, UTC
temp, dewp
Temperature and dewpoint in F
humid
Relative humidity
wind_dir, wind_speed, wind_gust
Wind direction (in degrees), speed and gust speed (in mph)
precip
Precipitation, in inches
pressure
Sea level pressure in millibars
visib
Visibility in miles
time_hour
Date and hour of the recording as a POSIXct
date, UTC
ASOS download from Iowa Environmental Mesonet, https://mesonet.agron.iastate.edu/request/download.phtml
get_flights
for flight data,
get_airlines
for airlines data,
get_airports
for airports data,
get_planes
for planes data,
or anyflights
for a wrapper function.
Use the as_flights_package
function to convert this dataset
to a data-only package.
# query weather at Portland International in June 2018 ## Not run: get_weather("PDX", 2018, 6) # ...or the original nycflights13 weather dataset ## Not run: get_weather(c("JFK", "LGA", "EWR"), 2013) # use the dir argument to indicate the folder to # save the data in as "weather.rda" ## Not run: get_weather("PDX", 2018, 6, dir = tempdir())
# query weather at Portland International in June 2018 ## Not run: get_weather("PDX", 2018, 6) # ...or the original nycflights13 weather dataset ## Not run: get_weather(c("JFK", "LGA", "EWR"), 2013) # use the dir argument to indicate the folder to # save the data in as "weather.rda" ## Not run: get_weather("PDX", 2018, 6, dir = tempdir())