Discussion
Critic of results
The first positive element from the new analysis implementation that should be mentioned is that having all the intermediate steps spectra, values, etc. saved is a great help for debugging and understanding results (as could be seen in the 354 keV case for example). That’s how we can tell that, apart from the few elements mentioned before, the results presented are a correct reflection of the data set.
The (overall small) differences with the results extracted from the same dataset using the previous analysis code support the idea that an over correction (or at least incorrect correction) of the time shifts is applied. This is expected, in regard to the number of lines in the previous analysis code (1140 lines of code in the main function, calling many outside dependence). It would take a long time to go through the original code in order to find the proper way to input correctly the time shift parameters for a new dataset.
The compatibility of experimental values with Talys is limited, but not particularly worse than for other nuclei we studied already [1], [3], and [2]. For comparison, Figure 65 shows an example of results on even-even isotopes, compared to predictions from reactions codes. The agreement between experimental values from different analysis (previous code and P. Scholtes’ values) gives us added confidence in the results.
The correlation matrices are overall positive (almost flat) with an average correlation factor typically 0.5 (with a standard deviation of 0.25). This is expected as the target mass and HPGe detection efficiencies are the main source of parametric uncertainties, which apply to all the neutron energy windows in the same way and thus creating high correlation. The effect of the two fission chambers might be enhancing the positive correlations.
When compared to previous correlation matrices, either from another Monte Carlo analysis (Figure 66) or from a semi-Monte Carlo approach [4], [5], and [6] (Figure 67), our results are not too different, with mostly positive corrections. The negative correlations observed with other analysis maybe explained by the use of only the UF4 fission chamber, reducing the global effect of the fission chambers and leaving others (such as time uncertainty) to have a greater impact on the correlation factors. Still, it is important to remember that the correlation matrix reflects how the data was analyzed (see earlier). Therefore, it is normal that different analysis methods will yield different matrices, even if the results (central values and uncertainties) are very compatible.
Of course, as we saw earlier, there are still elements to improve, mostly on the fitting procedure. It is on top of the list of future improvements.
SWOT analysis
To conclude on the interest of the new analysis code, we draw a Strength-Weakness-Opportunity-Threat analysis:
- Strengths
The code is a Full Monte Carlo analysis of the data, including sources of uncertainties that could be difficult to include in an analytical version (namely, uncertainties on parameters impacting the time of flight / neutron energy). By saving all the intermediate files, the debugging or study of suspicious results is made much simpler, with the ability of traveling back from a result file to the different previous steps leading to it.
- Weaknesses
The processing is currently very slow (about 2 hours per one full Monte Carlo iteration [7]). The fitting procedure has been shown to be fragile, with some unexpected failure to produce a meaningful value at times.
- Opportunities
New data from Grapheme, as well as data that will be recorded at NFS, will benefit greatly from this complete and flexible analysis code.
- Threats
There is a dual risk in the future for such a code. Either it is made so specific that any new case (in terms of number of detectors, …) requires to completely rewrite and add new modules. Or it could become too heavy, with so many options that most of the user’s time is taken building the pipeline. What we want to avoid is to have a repeat of the previous code’s fate, and have a code so specific to a dataset that it cannot be reused for new data.
The following part will discuss how we can improve the code, in answers to weaknesses, while being careful not to fall in the threats pitfalls.
Perspectives
There are several perspectives for improvement on different plans: the inner working of the code, the way the users interact with the code, and the physics that can be treated by the code.
Fit improvements
The top priority is to improve the fit procedure.
Currently, the fit is performed using the curve_fit
method of the scipy
library for python [8] , [9].
This method uses a least-square minimization of the formula used to describe the spectrum shape (linear background and Gaussian peaks).
The fit uses the Trust Region Reflective fitting method, (the default for a minimization with bounds), and the use of this default method or the other offered by the scipy
library (e.g. dogleg algorithm) was not investigated.
The Root code is relying on the log likelihood fitting method, more adapted to the adjustment on count histograms.
One way of improvement for the fit would be to implement a log likelihood fit.
This can be done in scipy
by subclassing the rv_discrete
base class in order to describe the spectrum shape.
This method, requires a significant amount of work and testing before being validated.
Quicker to implement, one can test the dogleg fitting algorithm of curve_fit
or, with slightly more work, the different fitting methods allowed by scipy‘s optimize.minimize
method.
An additional work on the starting values given to the fitting procedure, and the bounds for the parameters, might also improve the results. In particular, the guess for the background level could be improved.
Finally, the current fitting script has been written to be completely flexible and accommodate any number of parasites. To simplify the code and reduce the number of parameters, one could write fall back scripts that will perform the fit on more specific function in usual cases (particularly when there is only one parasite). (That goes against the principle that special cases aren’t special enough to break the rules, but on the other hand, practicality beats purity.)
Update
See in Appendix for updated information on the fit improvements.
User improvements
For the general users, the current code should be usable out of the box, but to accommodate other experiments, some modifications will be helpful.
The definition of the analysis pipeline should be made easier, by providing dedicated scripts to set up the different analysis steps.
A yaml
file, following the gitlab’s CI/CD model, could be a way to implement this.
However, that will require a significant amount of work for a small gain.
A series of bash
scripts to configure and run the pipeline is easier to envision and will do the job nicely enough.
Additionally, a tool in python (class, decorator, …) to simplify the file and task dependency management could be created, so that the creation of new tasks is easy, without needing to understand the details of doit
.
In particular, a task factory could be created for simple / usual tasks (2D histogram projection, histogram scaling, …).
On the Monte Carlo side, it would be good to have the possibility to run the iterations side-by-side.
This will be possible by adding a setup step at the start, when a temporary working directory will be created, with links to the source files of interest.
The iteration will run entirely with the files in this directory, allowing several jobs to run in parallel, each in their own working environment.
Combine with the job dependency option in sbatch
to automatically run the post-loops stage, the user could start a full Monte Carlo analysis in one command and get a result in a couple of hours (compared to a couple of days at the moment).
Finally, to make the porting of the framework easier, a Docker or Singularity image could be created, that would include the dependencies for the code to run (the tasks file would not be included in the image).
Technical code improvements
A few technical improvements, that will only impact the inner working of the analysis without changing the way it is used (and the way it produces results) are foreseen:
Put the context information (namely, detector names, paths, …) in a yaml file imported as constants in a python module imported by the doit tasks scripts, instead of being individually treated at the top of each file. (See context module in task files for update.)
Using this context module, we can rewrite the tasks code, making sure there are no hard-coded values.
Move the HPGe detector efficiency sampling to the new MC iteration stage, rather than somewhere in the iteration (see MC iteration setup), for consistency. And ensure a proper central value sampling of the efficiencies when the iteration
uuid
is000000
(see MC iteration setup).The number of parameters files can be reduced. It should be checked that each parameter values (detector distance of flight, efficiency, …) are written only once in one parameter file.
The tasks functions still written in the tasks files should be moved at least in an outside module, or (better) in scripts that can be called directly with the command line. (See context module in task files for update.)
The python code formatter “black“ has been used in the late part of the project to unify the format of the code. This leads to typically longer files, but generally increased readability. In the future,
black
could be used as a default step in validation of the python script to ensure consistent formatting, as well as making debugging easier.The doit task dependencies are, at the moment, computed using a file signature that takes time to calculate. Since there are many files involved in the analysis pipeline, a significant amount of time is dedicated to computing the file’s
md5
signature (since we run in a Monte Carlo loop, most of the files need to be updated anyway, so this strong requirement on dependency calculation is an overkill). Switching totimestamp
(a simple matter withdoit
) will save time and make the analysis run much faster. (See Other updates for update.)Last but not least, the current pyhisto implementation of histogram has the advantage of having no outside dependency. But being 100 % Python, it’s execution (when performing projections, cuts, …) is slow. To gain speed and memory usage, we could rely on the C++ library Boost that implements histograms. The scikit-hep/hist module already implements the
boost::histogram
into python, so we just have to wrap around it with a class that reads and write thepyhisto
format and interface, so that it can be smoothly interchanged with the current pure python module.
These improvements should allow a faster processing over all and make easier the reuse of the analysis code on different platforms.
Addition to the code
To make the code more versatile, and in order to be able to use it for other data sets, two additions have to be made:
First, set up a file reader and processors for data recorded using the Faster acquisition system.
A code to convert .fast
files into 1D and 2D histograms with the same format already exists.
One just needs to write the doit
python scripts to run the conversion and make sure the outputs match the histogram required by the existing scripts in the analysis pipeline.
Some specific scripts and tasks may need to be written to accommodate the data recorded with the 36-pixels HPGe [10] included in Grapheme since 2014, but overall, it will fit in the present framework easily. This will allow the use of the analysis software for analysis of the 239Pu data recorded at JRC-Geel, or future experiments at NFS, without any major hurdles.
The second addition is to set up the possibility to use the resimu
[4], [5], and [6] stages once the number of neutrons and \(\gamma\)s have been extracted from the data, to compute the cross sections (see Full Monte Carlo data analysis to produce uncertainties and covariances).
This would allow the generation of uncertainties and covariance and correlation matrices, with a faster processing time, since the Monte-Carlo part would be only done on the last stage. (See Use of intermediate data semi-Monte Carlo code on that topic.)
Extension of 183W analysis
Independently of the code improvement, more transitions could be added to the analysis.
A bit more raw input data could be taken in consideration. Indeed, for the development of the code, some data (representing an additional 10 to 15 % of the current size) were put aside, because, having low statistics, it required more work on Ge detectors energy calibration. By including it in the analysis, it will push up the statistics in the spectra and make the analysis more code reliable.
Using only one fission chamber in the flux determination (the UF4 one), will also significantly improve the accuracy of the flux and reduced the parametric uncertainty (at the cost of requiring more iterations).
We already identified a few transitions that could be extracted, although with less statistics, so a finer work on fit parameters and a limited number of neutron energy ranges can be included in the analysis.
In particular, for the \((\text{n},~2\text{n})\) channel, the 100.1 keV from the 182W first excited state (2+) to the ground state (L01L00) should be extracted. It will be partially contaminated with the 99.1 keV transition in 183W, decaying from a \(\tfrac{5}{2}^{-}\) state to the ground state, which is fed by the isomeric state of this isotope, making the contamination more tricky to handle (neutron energy selection might not be enough to cut out the \((\text{n},~\text{n}'~\gamma)\) contribution). The 260 keV \(\gamma\) ray mentioned earlier could also be investigated. The 1121 keV \((\text{n},~2\text{n}~\gamma)\), from the 2+ level at 1121.4 keV to the ground state in 182W, is also a transition of interest in the \((\text{n},~2\text{n})\) channel.
In the \((\text{n},~\text{n}')\) channel, the 84 keV transition from the \(\tfrac{5}{2}^{-}\) level at 291.7 keV to the \(\tfrac{7}{2}^{-}\) at 207 keV (L05L03) could also be looked at.
With these additional cross section extracted from the data, and in combination with \((\text{n},~\text{n}'~\gamma)\) and \((\text{n},~2\text{n}~\gamma)\) cross section from the isotopically enriched 182W and 184W targets [2] and [11] we will have a strong dataset to constrain the models [12], [13].
Future uses
The code will be used in future analysis (whether it is the analysis of 239Pu data, for data recorded at NFS). It offers full transparency on how the uncertainties are computed, which is needed for meaningful evaluation of experimental data. By design, it is intended to be flexible and reusable with different data (in terms of raw data format, number of detectors, … ) so it should become a standard in our data analysis.
The use of the code, once its abilities are extended (as discussed in Perspectives), will streamline the analysis process and may greatly shorten the time needed to go from raw data to final cross-section result.
Footnotes