Monday, March 30, 2020

Mapping COVID-19

With the world currently experiencing the Covid-19 crisis, many of our users are working remotely (aside: for details on how to use EViews at home, visit our Covid licensing page) anxious to follow data on how the virus is spreading across parts of the world. There are many sources of information on Covid-19, and we thought we’d demonstrate how to fetch some of these sources directly into EViews, and then display some graphics of the data. (Please visit our follow up post for a few more graph examples).

Table of Contents

  1. Johns Hopkins Data
  2. European Centre for Disease Prevention and Control Data
  3. New York Times US County Data
  4. Sneak Peaks

Johns Hopkins Data

To begin we'll retrieve data from the Covid-19 Time Series collection from Johns Hopkins Whiting School of Engineering Center for Systems Science and Engineering. These data are organized into three csv files, one containing confirmed cases, on containing deaths, and one recoveries at both country and state/province levels. Each file is organized such that the first column contains state/province name (where applicable), the second column the country name, the third and fourth contain average latitude and longitude, and then the remaining columns containing daily values.

There are a number of different approaches that could be used to import these data into an EViews workfile. We’ll demonstrate an approach that will stack the data into a single panel workfile. We’ll start with importing the confirmed cases data. EViews is able to directly open CSV files over the web using the File->Open->Foreign Data as Workfile menu item:


Figure 1: JH open path

Which results in the following workfile:


Figure 2: JH workfile

Each day of data has been imported into its own series, with the name of the series being the date. There are also series containing the country/region name and the province/state name, as well as latitude and longitude.

To create a panel, we’ll want to stack these date series into a single series, which we can do simply with the Proc->Reshape Current Page->Stack in New Page…


Figure 3: JH stack data dialog

Since all of the series we wish to stack have a similar naming structure – they all start with an “_” we can instruct EViews to stack using “_?” as the identifier, where ? is a wildcard. This results in the following stacked workfile page:


Figure 4: JH stack data workfile

Which is close to what we want, we simply need to tidy up some of the variable names, and instruct EViews to structure the page as a true panel. The date information has been imported into the alpha series VAR01, which we can convert into a true date series with:

    series date = @dateval(var01, "MM_DD_YYYY")
    
The actual cases data is stored in the series currently named "_", which we can rename to something more meaningful with:

    rename _ cases
    
And then finally we can structure the page as a panel by clicking on Proc->Structure/Resize current page, selecting Dated Panel as the structure type and filling in the date and filling in the cross-section and date information:


Figure 5: JH workfile restructure

When asked if we wish to remove blank values, we select no. We now have a 2-dimensional panel, with two sets of cross-sectional identifiers – one for province/state and the other for country:


Figure 6: JH 2D Panel

If we want to sum up the state level data to create a traditional panel with just country and time, we can do so by creating a new panel page based upon the indices of this page. Click on the New Page tab at the bottom of the workfile and select Specify by Identifier Series. In the resulting dialog we enter the country series as the cross-section identifier we wish to keep:


Figure 6: JH page by ID

Which results in a panel. We can then copy the cases series from our 2D panel page to the new panel page with standard copy and paste, but ensuring to change the Contraction method to Sum in the Paste Special dialog:


Figure 7: JH paste dialog


Figure 8: JH panel workfile

With the data in a standard panel workfile, all of the standard EViews tools are now available. We can view a graph of the cases by country by opening the cases series, clicking on View->Graph, and then selecting Individual cross sections as the Panel option.


Figure 9: JH graph of all cross-sections

This graph may be a little unwieldy, so we can reduce the number of cross-sections down to, say, only countries that have, thus far, experienced more than 10,000 cases by using the smpl command:

    smpl if @maxsby(cases, country_region)>10000
    

Figure 9: JH cross-sections with more than 10000 cases

Of course, all of this could have been done in an EViews program, and it could be automated to combine all three data files, ending up with a panel containing cases, deaths and recoveries. The following EViews code produces such a panel:

    'close all existing workfiles
    close @wf
    
    'names of the three topics/files
    %topics = "confirmed deaths recovered"
    
    'loop through the topics
    for %topic {%topics}
        'build the url by taking the base url and then adding the topic in the middle
        %url = "https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_" + %topic + "_global.csv"
    
        'load up the url as a new page
        pageload(page=temp) {%url}
        
        'stack the page into a 3d panel
        pagestack(page=stack_{%topic}) _? @ *? *
        
        'do some renaming and make the date series
        rename country_region country 
        rename province_state province
        rename _ {%topic}

        series date = @dateval(var01, "MM_DD_YYYY")

        'structure the page
        pagestruct province country @date(date)
        
        'delete the original page
        pagedelete temp

        'create the 2D panel page
        pagecreate(id, page=panel) country @date @srcpage stack_{%topic}
    next
    
    'loop through the topics copying each from the 3D panel into the 2D panel
    for %topic {%topics}
        copy(c=sum) stack_{%topic}\{%topic} * @src @date country @dest @date country
        pagedelete stack_{%topic}
    next
    

European Centre for Disease Prevention and Control Data

The second repository we'll use is data provided by the ECDC's Covid-19 Data site. They provide an extremely easy to use data for each country, along with population data. Importing these data into EViews is trivial – you can open the XLSX file directly using the File->Open-Foreign Data as Workfile dialog and entering the URL to the XLSX in the File name box:


Figure 10: ECDC open path

The resulting workfile will look like this:


Figure 11: ECDC workfile

All we need to do is structure it as a panel, which we can do by clicking on Proc->Structure/Resize Current Page and then entering the cross-section and date identifiers (we also choose to keep an unbalanced panel by unchecking the Balance between starts & ends box).


Figure 12: ECDC strcture WF dialog

The result is an EViews panel workfile:


Figure 13: ECDC series

The data provided by ECDC contains the number of new cases and deaths each day. Most presentation of Covid-19 data has been with the total number of cases and deaths per country. We can create the totals with the @cumsum function which will produce the cumulative sum, resetting to zero as the start of each cross-section.

    series ccases = @cumsum(cases)
    series cdeaths = @cumsum(deaths)
    
With this panel we can perform standard panel data analysis, or produce graphs (see the Johns Hopkins examples above). However, since the ECDC have included standard ISO country codes for the countries, we can also tie the data to a geomap.

We found a simple shapefile of the world online, and downloaded it to our computer. In EViews we then click on Object->New Object->GeoMap to create a new geomap, and then drag the .prj file we downloaded onto the geomap.

In the properties box that appears, we tie the countries defined in the shapefile to the identifiers in the workfile. Since the shapefile uses ISO codes, and we have those in the countriesandterritories series, we can use those to map the workfile to the shapefile:


Figure 14: Geomap properties

Which results in the following global geomap:


Figure 15: Global geomap

We can use the Label: dropdown to remove the country labels to give a clearer view of the map (note this feature is a recent addition, you may need to update your copy of EViews to see the None option).

To add some color information to the map we click on Properties and then the Color tab. We'll add two custom color settings – a gradient fill so show differences in the number of cases, and a single solid color for countries with a large number of cases:



Figure 3a: ECDC geomap color range
Figure 3b: ECDC geomap color threshold

And then entering ccases as the coloring series. This results in a map:


Figure 17: ECDC geomap

Again, this could all be done programmatically with the following program (note the ranges for coloring will need to be changed as the virus becomes more wide spread):

    'download data
    wfopen https://www.ecdc.europa.eu/sites/default/files/documents/COVID-19-geographic-disbtribution-worldwide.xlsx
    rename countryterritorycode iso3
    pagecontract if iso3<>""
    pagestruct(bal=m) iso3 @date(daterep)
    
    'make cumulative data
    series ccases = @cumsum(cases)
    series cdeaths = @cumsum(deaths)
    
    'make geomap for cases
    geomap cases_map
    cases_map.load ".\World Map\TM_WORLD_BORDERS_SIMPL-0.3.prj"
    cases_map.link iso3 iso3
    cases_map.options -legend
    cases_map.setlabel none
    cases_map.setfillcolor(t=custom) mapser(ccases) naclr(@RGB(255,255,255))  range(lim(0,12000,cboth), rangeclr(@grad(@RGB(255,255,255),@RGB(0,0,255))), outclr(@trans,@trans), name("Range")) thresh(12000, below(@trans), above(@RGB(0,0,255)), name("Threshold"))
    
    'make geomaps for deaths
    geomap deaths_map
    deaths_map.load ".\World Map\TM_WORLD_BORDERS_SIMPL-0.3.prj"
    deaths_map.link iso3 iso3
    deaths_map.options -legend
    deaths_map.setlabel none
    deaths_map.setfillcolor(t=custom) mapser(cdeaths) naclr(@RGB(255,255,255)) range(lim(1,500,cboth), rangeclr(@grad(@RGB(255,128,128),@RGB(128,64,64))), outclr(@trans,@trans), name("Range")) thresh(500,cleft,below(@trans),above(@RGB(128,0,0)),name("Threshold"))        
    

New York Times US County Data

The final data repository we will look at is the New York Times data for the United States at county level. These data are also trivial to import into EViews, you can again just enter the URL for the CSV file to open it. Rather than walking through the UI steps, we'll simply post the two lines of code required to import and structure as a panel:

    'retrieve data from NY Times github
    wfopen(page=covid) https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-counties.csv
    
    'structure as a panel based on date and FIPS ID
    pagestruct(dropna) fips @date(date)
    
Note that the New York Times have conveniently provided the FIPS code for each county, which means we can also produce some geomaps. We've downloaded a US county map from the Texas Data Repository, and then linked the FIPS series in the workfile with the FIPS_BEA attribute of the map:


Figure 17: Geomap FIPS properties

The full code to produce such a map is:

    'retrieve data from NY Times github
    wfopen(page=covid) https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-counties.csv
    
    'structure as a panel based on date and FIPS ID
    pagestruct(dropna) fips @date(date)
    
    'set displaynames for use in geomaps
    cases.displayname Confirmed Cases
    deaths.displayname Deaths
    
    'make geomap
    geomap cases_map
    cases_map.load ".\Us County Map\CountiesBEA.prj"
    cases_map.link fips_bea fips
    cases_map.options -legend
    cases_map.setlabel none
    cases_map.setfillcolor(t=custom) mapser(cases) naclr(@RGB(255,255,255))  range(lim(1,200,cboth), rangeclr(@grad(@RGB(204,204,255),@RGB(0,0,255))), outclr(@trans,@trans), name("Range")) thresh(200, below(@trans), above(@RGB(0,0,255)), name("Threshold"))        
    

Sneak Peaks

One of the features our engineering team have been working on for the next major release of EViews is the ability to produce animated graphs and geomaps (the keen eyed amongst you may have noticed the Animate button on a few of our screenshots). Whilst this feature is a little far away from release, the Covid-19 data does give an interesting set of testing procedures, and we thought we'd share some of the results.


Animation 1: US counties cases evolution


Animation 2: Confirmed cases

9 comments:

  1. Brilliant post, thanks for this!

    ReplyDelete
  2. its really very useful information and presentaion

    ReplyDelete
  3. Hello EViews Team.
    Great Work. Felicitations.
    Very Thanks.

    ReplyDelete
  4. This comment has been removed by the author.

    ReplyDelete
  5. HELLO. Please, how could l download the population by country data from the site of Johns Hopkins Whiting School of Engineering Center for Systems Science and Engineering? For example in an CSV file?
    Cordially

    ReplyDelete
  6. Hello EViews team, I have prepared a small program on EViews, but l still have a doubt: in the logistic model, On what basis K=b1-1 (k= the max value)?

    / The logistic model: C=b1/(1+b2*exp(-b2*trend))

    Best wishes.

    ReplyDelete
  7. Hello, assuming we have the following two methods to determine the peack point;

    1st Method: Estimate the logistic model for the cumulative number of infected cases i.e. d(hat(C)), then we look for the peack date (The return point of the function).

    2nd Method: Estimate the derived function of the logistic model for the number of infected cases per day i.e. hat(d(C)), then we look for the peack date (The max of the function).

    Please, what is the best method, because the two methods do not have the same result?

    Cordially.

    ReplyDelete
  8. It's very detailed, professional and interesting! Thanks!

    ReplyDelete