With the world currently experiencing the Covid-19 crisis, many of our users are working remotely (aside: for details on how to use EViews at home, visit our Covid licensing page) anxious to follow data on how the virus is spreading across parts of the world. There are many sources of information on Covid-19, and we thought we’d demonstrate how to fetch some of these sources directly into EViews, and then display some graphics of the data. (Please visit our follow up post for a few more graph examples).
To begin we'll retrieve data from the Covid-19 Time Series collection from Johns Hopkins Whiting School of Engineering Center for Systems Science and Engineering. These data are organized into three csv files, one containing confirmed cases, on containing deaths, and one recoveries at both country and state/province levels. Each file is organized such that the first column contains state/province name (where applicable), the second column the country name, the third and fourth contain average latitude and longitude, and then the remaining columns containing daily values.
There are a number of different approaches that could be used to import these data into an EViews workfile. We’ll demonstrate an approach that will stack the data into a single panel workfile. We’ll start with importing the confirmed cases data. EViews is able to directly open CSV files over the web using the File->Open->Foreign Data as Workfile menu item:
Figure 1: JH open path
Which results in the following workfile:
Figure 2: JH workfile
Each day of data has been imported into its own series, with the name of the series being the date. There are also series containing the country/region name and the province/state name, as well as latitude and longitude.
To create a panel, we’ll want to stack these date series into a single series, which we can do simply with the Proc->Reshape Current Page->Stack in New Page…
Figure 3: JH stack data dialog
Since all of the series we wish to stack have a similar naming structure – they all start with an “_” we can instruct EViews to stack using “_?” as the identifier, where ? is a wildcard. This results in the following stacked workfile page:
Figure 4: JH stack data workfile
Which is close to what we want, we simply need to tidy up some of the variable names, and instruct EViews to structure the page as a true panel. The date information has been imported into the alpha series VAR01, which we can convert into a true date series with:
series date = @dateval(var01, "MM_DD_YYYY")
The actual cases data is stored in the series currently named "_", which we can rename to something more meaningful with:
rename _ cases
And then finally we can structure the page as a panel by clicking on Proc->Structure/Resize current page, selecting Dated Panel as the structure type and filling in the date and filling in the cross-section and date information:
Figure 5: JH workfile restructure
When asked if we wish to remove blank values, we select no. We now have a 2-dimensional panel, with two sets of cross-sectional identifiers – one for province/state and the other for country:
Figure 6: JH 2D Panel
If we want to sum up the state level data to create a traditional panel with just country and time, we can do so by creating a new panel page based upon the indices of this page. Click on the New Page tab at the bottom of the workfile and select Specify by Identifier Series. In the resulting dialog we enter the country series as the cross-section identifier we wish to keep:
Figure 6: JH page by ID
Which results in a panel. We can then copy the cases series from our 2D panel page to the new panel page with standard copy and paste, but ensuring to change the Contraction method to Sum in the Paste Special dialog:
Figure 7: JH paste dialog
Figure 8: JH panel workfile
With the data in a standard panel workfile, all of the standard EViews tools are now available. We can view a graph of the cases by country by opening the cases series, clicking on View->Graph, and then selecting Individual cross sections as the Panel option.
Figure 9: JH graph of all cross-sections
This graph may be a little unwieldy, so we can reduce the number of cross-sections down to, say, only countries that have, thus far, experienced more than 10,000 cases by using the smpl command:
smpl if @maxsby(cases, country_region)>10000
Figure 9: JH cross-sections with more than 10000 cases
Of course, all of this could have been done in an EViews program, and it could be automated to combine all three data files, ending up with a panel containing cases, deaths and recoveries. The following EViews code produces such a panel:
'close all existing workfiles
close @wf
'names of the three topics/files
%topics = "confirmed deaths recovered"
'loop through the topics
for %topic {%topics}
'build the url by taking the base url and then adding the topic in the middle
%url = "https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_" + %topic + "_global.csv"
'load up the url as a new page
pageload(page=temp) {%url}
'stack the page into a 3d panel
pagestack(page=stack_{%topic}) _? @ *? *
'do some renaming and make the date series
rename country_region country
rename province_state province
rename _ {%topic}
series date = @dateval(var01, "MM_DD_YYYY")
'structure the page
pagestruct province country @date(date)
'delete the original page
pagedelete temp
'create the 2D panel page
pagecreate(id, page=panel) country @date @srcpage stack_{%topic}
next
'loop through the topics copying each from the 3D panel into the 2D panel
for %topic {%topics}
copy(c=sum) stack_{%topic}\{%topic} * @src @date country @dest @date country
pagedelete stack_{%topic}
next
European Centre for Disease Prevention and Control Data
The second repository we'll use is data provided by the ECDC's Covid-19 Data site. They provide an extremely easy to use data for each country, along with population data. Importing these data into EViews is trivial – you can open the XLSX file directly using the File->Open-Foreign Data as Workfile dialog and entering the URL to the XLSX in the File name box:
Figure 10: ECDC open path
The resulting workfile will look like this:
Figure 11: ECDC workfile
All we need to do is structure it as a panel, which we can do by clicking on Proc->Structure/Resize Current Page and then entering the cross-section and date identifiers (we also choose to keep an unbalanced panel by unchecking the Balance between starts & ends box).
Figure 12: ECDC strcture WF dialog
The result is an EViews panel workfile:
Figure 13: ECDC series
The data provided by ECDC contains the number of new cases and deaths each day. Most presentation of Covid-19 data has been with the total number of cases and deaths per country. We can create the totals with the @cumsum function which will produce the cumulative sum, resetting to zero as the start of each cross-section.
series ccases = @cumsum(cases)
series cdeaths = @cumsum(deaths)
With this panel we can perform standard panel data analysis, or produce graphs (see the Johns Hopkins examples above). However, since the ECDC have included standard ISO country codes for the countries, we can also tie the data to a geomap.
We found a simple shapefile of the world online, and downloaded it to our computer. In EViews we then click on Object->New Object->GeoMap to create a new geomap, and then drag the .prj file we downloaded onto the geomap.
In the properties box that appears, we tie the countries defined in the shapefile to the identifiers in the workfile. Since the shapefile uses ISO codes, and we have those in the countriesandterritories series, we can use those to map the workfile to the shapefile:
Figure 14: Geomap properties
Which results in the following global geomap:
Figure 15: Global geomap
We can use the Label: dropdown to remove the country labels to give a clearer view of the map (note this feature is a recent addition, you may need to update your copy of EViews to see the None option).
To add some color information to the map we click on Properties and then the Color tab. We'll add two custom color settings – a gradient fill so show differences in the number of cases, and a single solid color for countries with a large number of cases:
Figure 3a: ECDC geomap color range
Figure 3b: ECDC geomap color threshold
And then entering ccases as the coloring series. This results in a map:
Figure 17: ECDC geomap
Again, this could all be done programmatically with the following program (note the ranges for coloring will need to be changed as the virus becomes more wide spread):
The final data repository we will look at is the New York Times data for the United States at county level. These data are also trivial to import into EViews, you can again just enter the URL for the CSV file to open it. Rather than walking through the UI steps, we'll simply post the two lines of code required to import and structure as a panel:
'retrieve data from NY Times github
wfopen(page=covid) https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-counties.csv
'structure as a panel based on date and FIPS ID
pagestruct(dropna) fips @date(date)
Note that the New York Times have conveniently provided the FIPS code for each county, which means we can also produce some geomaps. We've downloaded a US county map from the Texas Data Repository, and then linked the FIPS series in the workfile with the FIPS_BEA attribute of the map:
Figure 17: Geomap FIPS properties
The full code to produce such a map is:
'retrieve data from NY Times github
wfopen(page=covid) https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-counties.csv
'structure as a panel based on date and FIPS ID
pagestruct(dropna) fips @date(date)
'set displaynames for use in geomaps
cases.displayname Confirmed Cases
deaths.displayname Deaths
'make geomap
geomap cases_map
cases_map.load ".\Us County Map\CountiesBEA.prj"
cases_map.link fips_bea fips
cases_map.options -legend
cases_map.setlabel none
cases_map.setfillcolor(t=custom) mapser(cases) naclr(@RGB(255,255,255)) range(lim(1,200,cboth), rangeclr(@grad(@RGB(204,204,255),@RGB(0,0,255))), outclr(@trans,@trans), name("Range")) thresh(200, below(@trans), above(@RGB(0,0,255)), name("Threshold"))
Sneak Peaks
One of the features our engineering team have been working on for the next major release of EViews is the ability to produce animated graphs and geomaps (the keen eyed amongst you may have noticed the Animate button on a few of our screenshots). Whilst this feature is a little far away from release, the Covid-19 data does give an interesting set of testing procedures, and we thought we'd share some of the results.
HELLO. Please, how could l download the population by country data from the site of Johns Hopkins Whiting School of Engineering Center for Systems Science and Engineering? For example in an CSV file? Cordially
Hello EViews team, I have prepared a small program on EViews, but l still have a doubt: in the logistic model, On what basis K=b1-1 (k= the max value)?
Hello, assuming we have the following two methods to determine the peack point;
1st Method: Estimate the logistic model for the cumulative number of infected cases i.e. d(hat(C)), then we look for the peack date (The return point of the function).
2nd Method: Estimate the derived function of the logistic model for the number of infected cases per day i.e. hat(d(C)), then we look for the peack date (The max of the function).
Please, what is the best method, because the two methods do not have the same result?
Brilliant post, thanks for this!
ReplyDeleteits really very useful information and presentaion
ReplyDeleteHello EViews Team.
ReplyDeleteGreat Work. Felicitations.
Very Thanks.
THanks
ReplyDeleteMore like this please!
This comment has been removed by the author.
ReplyDeleteHELLO. Please, how could l download the population by country data from the site of Johns Hopkins Whiting School of Engineering Center for Systems Science and Engineering? For example in an CSV file?
ReplyDeleteCordially
Hello EViews team, I have prepared a small program on EViews, but l still have a doubt: in the logistic model, On what basis K=b1-1 (k= the max value)?
ReplyDelete/ The logistic model: C=b1/(1+b2*exp(-b2*trend))
Best wishes.
Hello, assuming we have the following two methods to determine the peack point;
ReplyDelete1st Method: Estimate the logistic model for the cumulative number of infected cases i.e. d(hat(C)), then we look for the peack date (The return point of the function).
2nd Method: Estimate the derived function of the logistic model for the number of infected cases per day i.e. hat(d(C)), then we look for the peack date (The max of the function).
Please, what is the best method, because the two methods do not have the same result?
Cordially.
It's very detailed, professional and interesting! Thanks!
ReplyDelete