How to Read Sas Plots Into R
This is another in my series of blogs where I take a deep dive into converting a customized R graph into a SAS ODS Graphics graph. This fourth dimension the example is a needle plot (that'southward essentially like a bar plot, with lots of tiny bars, plotted along a continuous xaxis).
With all the speculation and uncertainty virtually future gasoline prices in the US, I decided to employ historical gasoline price data for this case. And to go you in the mood for plotting this information, here is a picture my canoeing friend Paula took along the Intracoastal Waterway in South Carolina. Can you approximate when she took this movie?
I'll be plotting the U.South. average gasoline cost, from the year 2001, to the current value. The height of the 'needle' bars represents the price, and I make the graph a scrap easier to read by coloring the needles based on the gasoline price - at each l cents higher price, a darker shade of carmine is used. Below are the 2 comparable graphs, created using R and SAS.
R Graph, created using ggplot()
SAS Graph, created using Proc SGplot
My Approach
I will exist showing the R code (in blue) first, and then the equivalent SAS code (in red) I used to create both of the graphs. Note that there aremany different means to reach the same things in both R and SAS - and go along in mind that the code I bear witness here isn't the merely (and probably not even the 'best') manner to practice things. If yous know of a improve/simpler way to do these things, feel free to share in the comments (note that I'm looking for improve/easier-to-follow/best-practices - not merely different/shorter only more obscure code!).
Likewise, the code below is sometimes not presented in the same order I ran it in to create the graphs (so yous won't be able to simply copy-due north-paste information technology and run it). And I don't include every bit of code hither in the blog (in detail, pieces I've already covered in previous blog posts). I include links to the full R and SAS programs at the bottom of the blog postal service, that you can download and run.
Getting The Information
The data for this example comes from the U.S. Energy Information Administration. Later a fleck of searching, I found the data I wanted on the following page. There is a link for downloading the information as an Excel spreadsheet, and I apply that URL in the code below. Hither is a screen-capture of what the information looks like in the Excel spreadsheet:
In R, I place the URL into a variable I call 'url', and so employ the get() role to get the file from the web, and save in into my account'southward temporary space using a unique random filename (for case C:\Users\realliso\AppData\Local\Temp\Rtmp00msAp\file65acd695b4e.xls). I salvage the temporary filename in a variable called 'tf', and so feed that into the read_excel() office, telling information technology to read the information from the "Data 1" sail, and skip the starting time ii lines.
#install.packages("tidyverse")
library(readxl)
library(httr)
url <- "https://www.environmental impact assessment.gov/dnav/pet/hist_xls/EMM_EPMR_PTE_NUS_DPGw.xls"
GET(url,write_disk(tf<-tempfile(fileext=".xls")))
my_data <- read_excel(tf,sheet="Data 1",skip=2)
In SAS, I apply a filename statement to create a filename chosen 'xlsfile' which points to the URL where the Excel file is stored. I then use a data pace to download that file and save it into the electric current directory (using the same Excel filename, rather than a random/unique proper name). I then apply Proc Import to read the Excel information and save it as a SAS dataset called my_data. Annotation that the range statement lets me specify the Excel sail, and range of cells to read (in 'B0' the B is the cavalcade name, and the 0 tells it to read until the data stops).
%allow xlsname=EMM_EPMR_PTE_NUS_DPGw.xls;
filename xlsfile url "https://www.environmental impact assessment.gov/dnav/pet/hist_xls/&xlsname";
information _null_;
n=-1;
infile xlsfile recfm=s nbyte=n length=len _infile_=tmp;
input;
file "&xlsname" recfm=n;
put tmp $varying32767. len;
run;
proc import out=my_data datafile="&xlsname" dbms=xls replace;
range="Data 1$A4:B0";
getnames=NO;
run;
Preparing The Data
In the R code below, the first line assigns mnemonic names to the two variables read in from the spreadsheet. The second line creates a date variable from the date_time variable. The third line determines which price range the price is in, which I'll utilise after to color the needles/bars (each l cent increment volition use a darker shade of scarlet). And the last line subsets the information to limit it to simply the data from 2001 onward.
names(my_data) <- c("date_time","gasoline_price")
my_data <- my_data %>% mutate(date=equally.Date(date_time))
my_data <- my_data %>% mutate(price_range=every bit.character(as.integer(gasoline_price/.fifty)))
my_data <- my_data[my_data$date>="2001-01-01",]
In SAS, I use the rename= statement to rename the variables in the dataset. The Excel dates are read in as dates in SAS (whereas R read them in as datetime), therefore I don't need to convert them. I use a elementary equation to assign the price range which I'll apply to color the needles. And an if statement lets me keep only the data from 2001 onward.
information my_data; set up my_data (rename=(a=appointment b=gasoline_price));
price_range=int(gasoline_price/.50);
if appointment>='01jan2001'd and gasoline_price^=. so output;
run;
Plotting The Data
In R, I use the ggplot() function, and tell it the name of my data, x & y variables, and a color variable in the aes() office. I and so use the geom_col() function to describe the needles.In this case, the needles are so tightly packed together, the visual issue is solid areas of colour (this is the visual event I wanted).
my_plot <- ggplot(my_data,aes(x=date,y=gasoline_price,color=price_range)) +
geom_col() +
In SAS, I apply Proc SGplot, identify my dataset using data=, and and then apply a needle statement, where I identify my ten and y variables, and the group= (which controls the colour).
proc sgplot data=my_data noautolegend;
needle x=date y=gasoline_price / group=price_range;
Controlling The Colors
In R, I specify a color for each of the possible values of the price_range variable (one-9), using the scale_color_manual office.
scale_color_manual(values = c(
"1" = "#FFFFCC",
"2" = "#FFEDA0",
"3" = "#FED976",
"4" = "#FEB24C",
"5" = "#FD8D3C",
"6" = "#FC4E2A",
"7" = "#E31A1C",
"8" = "#BD0026",
"ix" = "#800026"
)) +
In SAS, I create an attribute map map dataset, which maps the price_range values (i-9) to the desired linecolors. Since an attribute map dataset can contain attributes for more than one affair, yous besides have to create an ID variable, which you afterward specify as the attrid= when yous run SGplot.
data myattrmap;
length linecolor $9;
input ID $ value linecolor $;
datalines;
myid 1 cxFFFFCC
myid 2 cxFFEDA0
myid 3 cxFED976
myid four cxFEB24C
myid 5 cxFD8D3C
myid 6 cxFC4E2A
myid 7 cxE31A1C
myid eight cxBD0026
myid 9 cx800026
;
run;
proc sgplot data=my_data dattrmap=myattrmap noautolegend;
needle x=date y=gasoline_price / group=price_range attrid=myid;
Rotating Xaxis Years
Past default, the twelvemonth values forth the Xaxis are horizontal. I would typically get out them in the horizontal orientation because they are easier to read that way ... simply the year values become a chip crowded in this particular graph, therefore I rotate them ninety degrees. In R, I do that by specifying an angle in the theme for the axis.text.x.
theme(axis.text.x=element_text(face="assuming",color="#333333",size=xi,bending=90,vjust=.v))
In SAS, I specify options on the xaxis statement to command this. (Note that the rotation is non honored, unless the values along the xaxis get crowded.)
xaxis display=(nolabel)
values=('01jan2001'd to '01jan2022'd past year)
valueattrs=(size=11pt weight=bold color=gray33)
valuesrotate=vertical fitpolicy=rotate notimesplit;
Rotate & Reposition Yaxis Characterization
By default, the yaxis label ('$/gal') is rotated vertically (ie, the text is sideways), and placed at the middle of the axis. That's ok for long text labels, but when it'south a brusk label I prefer to have the text in the horizontal (un-rotated) orientation, and at the elevation of the axis - I think it'south easier to read, and more intuitive that way. In R, I control this by modifying the 'theme' - it took a bit of trial-and-mistake to effigy out which combination of angle and horizontal/vertical justification got the centrality title the way I wanted it.
theme(axis.title.y=element_text(angle=0,hjust=0,vjust=1)) +
In SAS, I can specify the labelposition in the yaxis statement. And information technology actually places the label at the tiptop of the axis, which I like better than placing information technology on the top/left like R does (because it takes less space). If anyone knows how to go the R centrality label at the tiptop of the yaxis similar SAS does, delight share the control(due south) in the comments!
yaxis labelposition=height
Add Yaxis on Right
Like a lot of time serial graphs, the most of import information (ie, the latest data) is on the right, but the default yaxis is in the traditional location on the left. Therefore I wanted to also add a yaxis on the right. In R, I do this past adding a secondary axis, and 'deriving' name/breaks/labels/etc from the first yaxis.
scale_y_continuous(sec.axis=sec_axis(trans=~.,name=derive(),breaks=derive(),
labels=derive(),guide=derive())) +
In SAS, I use a y2axis statement, and specify the same options I did in the yaxis.
y2axis labelposition=top labelattrs=(weight=assuming) values=(0 to five by .v)
valueattrs=(size=11pt weight=assuming color=gray33)
grid offsetmin=0 offsetmax=0;
Dollar Format
Hither in the U.S., gasoline prices are in dollars per gallon, therefore I wanted the values along the yaxis to have a '$' on the left, and evidence 2 decimal places. In R, I specify this formatting when I define the yaxis.
scale_y_continuous(limits=c(0,v),breaks=seq(0,v.00,.l),
expand=c(0,0),labels=scales::dollar_format(),
Similarly, in SAS, I use the valuesformat choice on the yaxis argument.
yaxis labelposition=peak labelattrs=(weight=bold)
values=(0 to 5 by .five) valuesformat=dollar8.two
Overlaying Marker for Latest Value
If the prices are decreasing sharply, it is sometimes difficult to see the meridian of the last/latest bar. Therefore I add an '10' marker at the top of the concluding bar. Here'southward an case showing the March 23, 2020 version of the graph, both with and without an 'ten' at the elevation of the last needle, so yous can see what a departure it makes. (I think it's a lot easier to see that the values are falling sharply in the version with the x, on the right.)
There are multiple means to add a marker to a graph, in both R and SAS ... In this R instance, I created a 2nd dataset, containing only the one row of information for the last (almost recent) value. I and then overlay the point on the graph, plotting data from but that dataset.
max_day_data <- my_data %>% filter(date==max(date))
geom_point(information=max_day_data,shape='cross',size=ii,color="black") +
In SAS, I added an extra variable to the principal dataset, but containing a value for the last (near recent) value - all the other rows in the dataset take a 'missing' value. I and so overlaid the '10' marker on the SGplot using a scatter statement. Note that in SAS, the data for all the plots being overlaid in SGplot must be in a single dataset (you could utilise a 2d 'annotate' dataset to overlay a marker, just that's a different technique).
information my_data; set my_data end=last;
if last then do;
final_price=gasoline_price;
stop;
run;
scatter y=final_price x=appointment / y2axis markerattrs=(size=7px color=black symbol=Ten);
Insert Calculated Values in Footnote/Caption
It's difficult to judge what the ending date for the final information point is past just looking at the graph. Therefore I programmatically decide the values, and display them in a text footnote/caption below the graph.
In R, I employ the max() function on the max_day_data, and create variables for max_date and end_price. I and then use the labs() function to create a text caption characterization below the graph, with those values inserted into the text.
max_date <- max(as.grapheme(max_day_data$date, format="%B %d, %Y"))
end_price <- max(dollar(max_day_data$gasoline_price))
labs(caption=paste("Data source: eia.doe.gov ",max_date," (ending price = ",end_price,")")) +
In SAS, I use Proc SQL to determine the values with the maximum appointment, and save them into macro variables. I so apply those macro variables in a footnote statement.
proc sql noprint;
select put(max(engagement),worddate.) into :max_date separated by ' ' from my_data;
select put(gasoline_price,dollar5.ii) into :end_price separated by ' ' from my_data having date=max(appointment);
quit; run;
footnote2 h=9pt font='albany amt' color=gray77
"Data source: eia.doe.gov &max_date (catastrophe toll = &end_price)";
My Lawmaking
Here is a link to my complete R programme that produced the graph.
Here is a link to my complete SAS program that produced the graph.
If yous take whatsoever comments, suggestions, corrections, or observations - I'd be happy to hear them in the comments section!
Source: https://blogs.sas.com/content/graphicallyspeaking/2021/02/05/sas-graphs-for-r-programmers-needle-plots/
0 Response to "How to Read Sas Plots Into R"
Post a Comment