Batch Submission of Data - Loading Files Using AQS
On this page:
Loading Files Using AQS
This section covers the file processing functions available within AQS. This section assumes that a file that has been through the stage process but has gone no further in AQS. (This would be the case if the Final Processing Step from the ENSC was Stage.)
Loading Files Using AQS - Training Video
Batch Form Overview
After logging in to AQS the user will select Batch from the Menu Bar.
AQS Batch Form Overview - Training Video
Process by File Tab
The "Process by File" tab shows a summary of all of the processing that has been performed by AQS for a submitted file.
The AQS Batch "Process by File" tab has 5 main areas.
1. File processing History and Status area.
The History and Status area shows a list of the files that have been submitted and the most recent action performed. There is one row per file.
There are six columns in this area, and they are:
- Submission Date: This indicates the original date and time the file was staged to AQS (the format: YYYYMMDD HH:mm where HH is on the 24 hour clock and Eastern Time).
- File Name: The name of the file.
- User name: The name of the last user in the agency who initiated a processing step for this file.
- Records in file: The total number of transactions in the file (records if it is a flat file, the equivalent for XML).
- Date (last): The last time an action was performed on this file.
- Process Status: The most recent process initiated for this file and the status of the process.
- Process can be:
- Stage = Converting data from submitted format to AQS database format
- Load = loading data into AQS database
- CRST = Critical Review and Statistical Tests
- Post = Incorporate data into summaries and make data public
- Status can be:
- Submitted (meaning it is waiting to execute)
- Active (actively processing)
- Error (completed with errors)
- Completed (completed with no errors)
- Process can be:
2. Load processing results summary.
The Load processing results summary. This area provides a summary of what happened to each record (transaction) in the file during all of the load processes.
There are three columns in this area, and they are:
- Recs Loaded: The total number of records from this file that have been successfully loaded into AQS. Keep in mind, Load means different things for different types of data. For everything but raw data, it means it is now fully into AQS. Raw data: the data still needs to be posted.
- Recs Failing to Load: The number of records that could not be loaded into AQS due to some kind of error. The "Recs Loaded" plus the value in this column should equal the number of records in the file.
- StatCR Finding Count: The number of records that raised concern during Statistical and Critical Review process (StatCR). The StatCR process is run automatically every time the load process is run with Raw Data. It compares new data to old data and expected ranges, etc. and presents any "findings" of possible outliers or other problematic data. These are not errors, they are oddities in the data flagged for user information.
3. Post processing results summary.
The Post processing results summary area provides a summary of the file with respect to the POST processes.
There are three columns in this area, and they are:
- Records to Post: The total number of Raw Data records from this file that need to be posted (the number of successfully loaded raw data records). If this number is more than zero, the Post process needs to be executed to complete the submission of this data to AQS.
- Skip'd Monitors: The number of monitors for which the posting of data was skipped during processing. This shoul only happen if another user is editing or submitting data for the same monitors.
- Records Posted: The number of records for which post processing completed successfully. When this number matches the "Records to Post" column for the file, processing is complete.
4. Process control - where results can be reviewed, and additional processes performed on files.
The process control area allows the user to review results of earlier processed files and perform additional processing. The process control area only acts on the file (row) that is highlighted on the top part of the form.
Depending on the status of the file, certain areas will be unavailable (grayed out). The processing of files is controlled on the horizontal row labeled "Process selected file through"
The user can select either load or post. Selecting post will perform load processing on the file if necessary.
The available reports generated by each process are shown below the process. For example, the load process generates two reports, the Load Summary and Errors report and the Stat CR Report. The post process generates only one report, the Raw data Inventory report.
The last group of buttons in the Process Control area are labeled 'Other', they perform the following actions:
- Show User Log: This opens the same file that was emailed to the user (summarizing the most recent processing on the file) in a browser window.
- Goto ENSC: This opens an ENSC (file submission) window in a browser window.
- Refresh Sessions: This updates all of the information on the top part of the screen. If the user is waiting for a job to finish, this user can press this buttonj rather than reloading the entire form to get updated information.
5. Process flow - A reminder of how AQS processes a file.
Stage
The first step is to Stage the data file. This means to transfer the file to AQS via the Exchange Network and then to convert the data to the AQS database format.
The following are the most common errors that can occur with Stage:
- If the file never shows up on the Batch form, then there was a transfer error and AQS never received the file.
- To diagnose: Review the status email from the Exchange Network and contact the Exchange network Node Help Desk. (email: nodehelpdesk@epacdx.net, Phone: (888) 890-1995).
- All Stage failures where AQS actually receives the file result in the status, "UPLOAD - FATAL" on the batch form. The email from the Exchange Network will provide a status that can be interpreted as follows:
- "E_AccessDenied: Access Denied" This means that the AQS user-id provided on the ENSC submission form is not authorized for the Exchange network user-id that submitted the file. The user can configure this on the AQS Admin/Security form.
- "E_LoaderNoProcess: AQSLOAD process is not active." This means that the AQS batch process started, but did not finish. Contact the EPA Call Center (Email: epacallcenter@epa.gov, Phone: (866) 411-4372) for assistance.
- "E_LoaderStartLimitExceeded: AQSLOAD start time limit has been exceeded." This mans that the user's account is not set up properly for batch submissions. Contact the EPA Call Center for assistance.
- "E_AccessDenied: Unknown AQS User Id specified. Access Denied." This means that an invalid AQS user-id was provided on the ENSC submit form.
- "E_InvalidScreeningGroup: Invalid Screening Group" This means that the user is not authorized for the selected screening group. The user should check the screening group provided and correct if incorrect, or contact the EPA Call Center if the user needs access to that screening group.
Load
After the file has been staged, it can be loaded. Loading the file moves the submitted data in to the main AQS database and performs field validation and relational (e.g., is pre-requisite data present) checks.
The Statistical Tests and Critical Review (sometimes abbreviated StatCR or CRST) process is also performed at the end of the load step for RAW Data.
This process compares data to other values and expected ranges, etc. and presents any "findings" of possible outliers or other problems. These "findings" are not errors, they are really indications of a data point that is an outlier. The data may be fine, but the user should be altered to any potential problems.
[Note] the results of the StatCR process can change over time. For example, the data representing an incomplete calendar quarter may be different from a complete calendar quarter (this includes data in the submitted file plus any data already in AQS).
The Load process generates the "Load Summary and Errors" report and the Statistical and Critical Review" report.
Also, the fields in the Load Status area at the top of the form will be updated and the system will send the user and email with a summary of the processing and links to all reports that were generated. This includes a link to a file that only contains the records in error.
Load Summary and Errors Report
This report contains two parts. Load Summary and Load Detail.
The first is a table summarizing, by transaction type, the number of records in error and the number of records loaded (in the column labeled "Pre-Production") and the number of records posted. Recall, raw data records will be loaded (to pre-production) and any records that are not raw data will be posted if they are not in error.
The second part of this report details any errors in the data on a record by record basis. The data, in transaction format, will be presented along with the error related to that record. If there are more than a few thousand errors in the file, the system stops listing them so the report will not include all of them.
Statistical and Critical Review report
The second report generated by the load process contains information about irregularities in the processed data. These are not errors, but warnings (or "findings"). This report will list the monitor, the date/time, and the problem with any data values.
Load Errors
The "Records Failing to Load" column at the top of the screen or the Load Summary and Errors report will show the user if there are load errors.
Clicking the Load Summary and Errors button will open the LOAD Report in a separate browser window. The third page of this report is where the Load Detail can be found.
- The raw data value is outside the acceptable values (the wrong unit code in the file).
- An incorrect site ID on the raw data transaction.
If errors occurred during the load process, there are two ways that the data can be fixed. The first is to fix it in AQS via the Correct option in the Menu Bar. (How to use Correct to edit the data is covered elsewhere). The second option is to use the link to the file of error transactions from the batch job email, to download the file, fix the errors, and resubmit.
Raw Data
If the user's file contains raw data the user must complete the Post process. If the file contains now raw data the data has been placed in the public part of the AQS database and the file process is complete.
Post
The post process is necessary for raw data. It moves data from where it can only be seen by members of a select screening group to where it can be seen by any AQS user (and any EPA web application, and thus the public). It also updates all of the summary values (e.g., NAAQS durations, Daily, Annual, and Design Value if applicable) that the raw data contributes to. It is built into the process as a stopping point to allow for review of any load errors or statistical findings before making the data public.
[Note] when the user Posts a file, it is run through both the Load (including the StatCR) and Post processes in AQS. This is to ensure that all the data in a file is processed together and no "stragglers" are left behind in our multi-step process. The unfortunate side effect of this is that older Load Summary and Error and StatCR reports will be overwritten with new (and perhaps empty) results. If a prior version of these reports is needed, it can be obtained using the history tab. (All reports are deleted from AQS after two weeks.)
Raw Data Inventory report
The Post process generates the Raw Data Inventory report. This report is a count by monitor and calendar quarter of the types of data actions (insert, update, and delete) that were taken by AQS.
After running the Post process the submittal of raw data is complete.
When Things Go Wrong
Unfortunately, not all files will be submitted in one step from the ENSC. Files may have an error that causes batch processing to stop (whether the entire file stops processing or just the records in error depends on the "Stop on Error" setting at the ENSC).
[Note] in the second quarter of 2013:
- 7,800 files, containing 50 million transactions total, were submitted to AQS.
- 25% of files with Raw Data transactions had a raw data error.
- 50% of files without Raw Data had an error (e.g., site, monitor, or QA data error).
- A file with an error was re-processed (corrected and re-loaded) an average of 1.5 times.
When things go wrong the user can use the AQS 'Correct' (which is covered in the Correct section of this document) or the user can fix the data on the user's end either in the file or in the database and regenerate the file.