Sparse dictionary learning

Sparse dictionary learning

Sparse dictionary learning (also known as sparse coding or SDL) is a representation learning method which aims to find a sparse representation of the input data in the form of a linear combination of basic elements as well as those basic elements themselves. These elements are called atoms, and they compose a dictionary. Atoms in the dictionary are not required to be orthogonal, and they may be an over-complete spanning set. This problem setup also allows the dimensionality of the signals being represented to be higher than any one of the signals being observed. These two properties lead to having seemingly redundant atoms that allow multiple representations of the same signal, but also provide an improvement in sparsity and flexibility of the representation. One of the most important applications of sparse dictionary learning is in the field of compressed sensing or signal recovery. In compressed sensing, a high-dimensional signal can be recovered with only a few linear measurements, provided that the signal is sparse or near-sparse. Since not all signals satisfy this condition, it is crucial to find a sparse representation of that signal such as the wavelet transform or the directional gradient of a rasterized matrix. Once a matrix or a high-dimensional vector is transferred to a sparse space, different recovery algorithms like basis pursuit, CoSaMP, or fast non-iterative algorithms can be used to recover the signal. One of the key principles of dictionary learning is that the dictionary has to be inferred from the input data. The emergence of sparse dictionary learning methods was stimulated by the fact that in signal processing, one typically wants to represent the input data using a minimal amount of components. Before this approach, the general practice was to use predefined dictionaries such as Fourier or wavelet transforms. However, in certain cases, a dictionary that is trained to fit the input data can significantly improve the sparsity, which has applications in data decomposition, compression, and analysis, and has been used in the fields of image denoising and classification, and video and audio processing. Sparsity and overcomplete dictionaries have immense applications in image compression, image fusion, and inpainting. == Problem statement == Given the input dataset X = [ x 1 , . . . , x K ] , x i ∈ R d {\displaystyle X=[x_{1},...,x_{K}],x_{i}\in \mathbb {R} ^{d}} we wish to find a dictionary D ∈ R d × n : D = [ d 1 , . . . , d n ] {\displaystyle \mathbf {D} \in \mathbb {R} ^{d\times n}:D=[d_{1},...,d_{n}]} and a representation R = [ r 1 , . . . , r K ] , r i ∈ R n {\displaystyle R=[r_{1},...,r_{K}],r_{i}\in \mathbb {R} ^{n}} such that both ‖ X − D R ‖ F 2 {\displaystyle \|X-\mathbf {D} R\|_{F}^{2}} is minimized and the representations r i {\displaystyle r_{i}} are sparse enough. This can be formulated as the following optimization problem: argmin D ∈ C , r i ∈ R n ∑ i = 1 K ‖ x i − D r i ‖ 2 2 + λ ‖ r i ‖ 0 {\displaystyle {\underset {\mathbf {D} \in {\mathcal {C}},r_{i}\in \mathbb {R} ^{n}}{\text{argmin}}}\sum _{i=1}^{K}\|x_{i}-\mathbf {D} r_{i}\|_{2}^{2}+\lambda \|r_{i}\|_{0}} , where C ≡ { D ∈ R d × n : ‖ d i ‖ 2 ≤ 1 ∀ i = 1 , . . . , n } {\displaystyle {\mathcal {C}}\equiv \{\mathbf {D} \in \mathbb {R} ^{d\times n}:\|d_{i}\|_{2}\leq 1\,\,\forall i=1,...,n\}} , λ > 0 {\displaystyle \lambda >0} C {\displaystyle {\mathcal {C}}} is required to constrain D {\displaystyle \mathbf {D} } so that its atoms would not reach arbitrarily high values allowing for arbitrarily low (but non-zero) values of r i {\displaystyle r_{i}} . λ {\displaystyle \lambda } controls the trade off between the sparsity and the minimization error. The minimization problem above is not convex because of the ℓ0-"norm" and solving this problem is NP-hard. In some cases L1-norm is known to ensure sparsity and so the above becomes a convex optimization problem with respect to each of the variables D {\displaystyle \mathbf {D} } and R {\displaystyle \mathbf {R} } when the other one is fixed, but it is not jointly convex in ( D , R ) {\displaystyle (\mathbf {D} ,\mathbf {R} )} . === Properties of the dictionary === The dictionary D {\displaystyle \mathbf {D} } defined above can be "undercomplete" if n < d {\displaystyle n d {\displaystyle n>d} with the latter being a typical assumption for a sparse dictionary learning problem. The case of a complete dictionary does not provide any improvement from a representational point of view and thus isn't considered. Undercomplete dictionaries represent the setup in which the actual input data lies in a lower-dimensional space. This case is strongly related to dimensionality reduction and techniques like principal component analysis which require atoms d 1 , . . . , d n {\displaystyle d_{1},...,d_{n}} to be orthogonal. The choice of these subspaces is crucial for efficient dimensionality reduction, but it is not trivial. And dimensionality reduction based on dictionary representation can be extended to address specific tasks such as data analysis or classification. However, their main downside is limiting the choice of atoms. Overcomplete dictionaries, however, do not require the atoms to be orthogonal (they will never have a basis anyway) thus allowing for more flexible dictionaries and richer data representations. An overcomplete dictionary which allows for sparse representation of signal can be a famous transform matrix (wavelets transform, fourier transform) or it can be formulated so that its elements are changed in such a way that it sparsely represents the given signal in a best way. Learned dictionaries are capable of giving sparser solutions as compared to predefined transform matrices. == Algorithms == As the optimization problem described above can be solved as a convex problem with respect to either dictionary or sparse coding while the other one of the two is fixed, most of the algorithms are based on the idea of iteratively updating one and then the other. The problem of finding an optimal sparse coding R {\displaystyle R} with a given dictionary D {\displaystyle \mathbf {D} } is known as sparse approximation (or sometimes just sparse coding problem). A number of algorithms have been developed to solve it (such as matching pursuit and LASSO) and are incorporated in the algorithms described below. === Method of optimal directions (MOD) === The method of optimal directions (or MOD) was one of the first methods introduced to tackle the sparse dictionary learning problem. The core idea of it is to solve the minimization problem subject to the limited number of non-zero components of the representation vector: min D , R { ‖ X − D R ‖ F 2 } s.t. ∀ i ‖ r i ‖ 0 ≤ T {\displaystyle \min _{\mathbf {D} ,R}\{\|X-\mathbf {D} R\|_{F}^{2}\}\,\,{\text{s.t.}}\,\,\forall i\,\,\|r_{i}\|_{0}\leq T} Here, F {\displaystyle F} denotes the Frobenius norm. MOD alternates between getting the sparse coding using a method such as matching pursuit and updating the dictionary by computing the analytical solution of the problem given by D = X R + {\displaystyle \mathbf {D} =XR^{+}} where R + {\displaystyle R^{+}} is a Moore-Penrose pseudoinverse. After this update D {\displaystyle \mathbf {D} } is renormalized to fit the constraints and the new sparse coding is obtained again. The process is repeated until convergence (or until a sufficiently small residue). MOD has proved to be a very efficient method for low-dimensional input data X {\displaystyle X} requiring just a few iterations to converge. However, due to the high complexity of the matrix-inversion operation, computing the pseudoinverse in high-dimensional cases is in many cases intractable. This shortcoming has inspired the development of other dictionary learning methods. === K-SVD === K-SVD is an algorithm that performs SVD at its core to update the atoms of the dictionary one by one and basically is a generalization of K-means. It enforces that each element of the input data x i {\displaystyle x_{i}} is encoded by a linear combination of not more than T 0 {\displaystyle T_{0}} elements in a way identical to the MOD approach: min D , R { ‖ X − D R ‖ F 2 } s.t. ∀ i ‖ r i ‖ 0 ≤ T 0 {\displaystyle \min _{\mathbf {D} ,R}\{\|X-\mathbf {D} R\|_{F}^{2}\}\,\,{\text{s.t.}}\,\,\forall i\,\,\|r_{i}\|_{0}\leq T_{0}} This algorithm's essence is to first fix the dictionary, find the best possible R {\displaystyle R} under the above constraint (using Orthogonal Matching Pursuit) and then iteratively update the atoms of dictionary D {\displaystyle \mathbf {D} } in the following manner: ‖ X − D R ‖ F 2 = | X − ∑ i = 1 K d i x T i | F 2 = ‖ E k − d k x T k ‖ F 2 {\displaystyle \|X-\mathbf {D} R\|_{F}^{2}=\left|X-\sum _{i=1}^{K}d_{i}x_{T}^{i}\right|_{F}^{2}=\|E_{k}-d_{k}x_{T}^{k}\|_{F}^{2}} The next steps of the algorithm include rank-1 approximation of the residual matrix E k {\displaystyle E_{k}} , updating d k {\displaystyle d_{k}} and enforcing the s

VueScan

VueScan is a computer program for image scanning, especially of photographs, including negatives. It supports optical character recognition (OCR) of text documents. The software can be downloaded and used free of charge, but adds a watermark on scans until a license is purchased. == Purpose == VueScan is intended to work with a large number of image scanners, excluding specialised professional scanners such as drum scanners, on many computer operating systems (OS), even if drivers for the scanner are not available for the OS. These scanners are supplied with device drivers and software to operate them, included in their price. A 2014 review considered that the reasons to purchase VueScan are to allow older scanners not supported by drivers for newer operating systems to be used in more up-to-date systems and for better scanning and processing of photographs (prints; also slides and negatives when supported by scanners) than is afforded by manufacturers' software. The review did not report any advantages to VueScan's processing of documents over other software. The reviewer considered VueScan comparable to SilverFast, a similar program, with support for some specific scanners better in one or the other. Vuescan supports more scanners, with a single purchase giving access to the full range of both film and flatbed scanners, and costs less. The VueScan program can be used with its own drivers or with drivers supplied by the scanner manufacturer, if supported by the operating system. VueScan drivers can also be used without the VueScan program by application software that supports scanning directly, such as Adobe Photoshop, again enabling the use of scanners without current manufacturers' drivers. In 2019 when Apple released macOS Catalina, they removed support for running 32-bit programs, including 32-bit drivers for scanning equipment. In response, Hamrick released VueScan 9.7, effectively saving thousands of scanners from being rendered obsolete. == Overview == VueScan enables the user to modify and fine-tune the scanning parameters. The program uses its own independent method to interface with scanner hardware, and can support many older scanners under computer operating systems for which drivers are not available, allowing old scanners to be used with newer platforms that do not otherwise support them. VueScan supports an increasing number of scanners and digital cameras; 2,400 on Windows, 2,100 on Mac OS X and 1,900 on Linux in 2018. VueScan is supplied as one downloadable file for each operating system, which supports the full range of scanners. Without the purchase of a license, the program runs in fully functional demonstration mode, identical to Professional mode, except that watermarks are superimposed on saved and printed images. Purchase of a license removes the watermark. A standard license allows updates for one year; a professional license allows unlimited updates and provides some additional features. VueScan supports optical character recognition (OCR), with English included, and 32 additional language packages available on its website. In September 2011, VueScan co-developer Ed Hamrick said that he was selling US$3 million per year of VueScan licenses.

Conditional disclosure of secrets

Conditional disclosure of secrets (CDS) is a primitive, studied in information-theoretic cryptography, that allows distributed, non-communicating parties to coordinate the release of information to a third party. CDS was initially introduced for use in the context of private information retrieval, and has been related to communication complexity and non-local quantum computation. == Definition of conditional disclosure of secrets == The conditional disclosure of secrets setting involves three players; Alice, Bob and the referee. Alice receives an input x ∈ { 0 , 1 } n {\displaystyle x\in \{0,1\}^{n}} and a secret z ∈ { 0 , 1 } {\displaystyle z\in \{0,1\}} , and Bob receives a string y ∈ { 0 , 1 } n {\displaystyle y\in \{0,1\}^{n}} . A choice of Boolean function f : { 0 , 1 } 2 n → { 0 , 1 } {\displaystyle f:\{0,1\}^{2n}\rightarrow \{0,1\}} is fixed in advance and known to all players. Alice and Bob cannot communicate with one another, but share a string of random bits which we label r {\displaystyle r} . Alice and Bob compute messages m A = m A ( x , z , r ) {\displaystyle m_{A}=m_{A}(x,z,r)} and m B = m B ( y , r ) {\displaystyle m_{B}=m_{B}(y,r)} , which they send to the referee. The referee knows ( x , y ) {\displaystyle (x,y)} . A CDS protocol consists of the encoding maps applied by Alice and Bob. A protocol is said to be ϵ {\displaystyle \epsilon } -correct if, for all ( x , y ) ∈ f − 1 ( 1 ) {\displaystyle (x,y)\in f^{-1}(1)} , the referee can determine z {\displaystyle z} with probability 1 − ϵ {\displaystyle 1-\epsilon } . A protocol is said to be δ {\displaystyle \delta } -secure if, for all ( x , y ) ∈ f − 1 ( 0 ) {\displaystyle (x,y)\in f^{-1}(0)} the distribution of the messages is δ {\displaystyle \delta } close to a simulator distribution (in total variation distance), where the simulator distribution is independent of z {\displaystyle z} . The communication complexity of a CDS protocol P is the total number of bits of message sent by Alice and Bob. The CDS communication cost of a function, C D S ϵ , δ ( f ) {\displaystyle CDS_{\epsilon ,\delta }(f)} is the minimal communication cost of an ϵ {\displaystyle \epsilon } -correct, δ {\displaystyle \delta } secure protocol that implements f {\displaystyle f} . The randomness complexity and randomness cost of implementing a function in the CDS model are defined similarly, but consider the number of bits of shared random bits held by Alice and Bob. == Basic properties of the primitive == === Amplification === Supposing we have an ϵ {\displaystyle \epsilon } -correct and δ {\displaystyle \delta } -secure CDS protocol, it is known that we can find a new protocol which reduces the correctness and privacy errors at the expense of an increased communication and randomness cost. More specifically, the following theorem has been proven Theorem (Amplification). A CDS protocol for f which supports a single-bit secret with privacy and correctness error of 1/3 can be transformed into a new CDS protocol with privacy and correctness error of 2 − Ω ( k ) {\displaystyle 2^{-\Omega (k)}} and communication/randomness complexity which are larger than those of the original protocol by a multiplicative factor of O(k). In fact, somewhat more than the above theorem is true in that the size of the secret can also be made to be of length k {\displaystyle k} , while simultaneously reducing the correctness and privacy errors as above. The proof involves first encoding the secret z {\displaystyle z} into a secret sharing scheme, and then running the original CDS protocol on each share of the resulting scheme. === Closure === If a CDS protocol for a function f {\displaystyle f} is known, then certain simple modifications of f {\displaystyle f} have CDS protocols with similar efficiency. The simplest case is to consider a CDS protocol for function f {\displaystyle f} and ask for a similarly efficient protocol for the negation of f {\displaystyle f} , labelled ¬ f {\displaystyle \neg f} . This is addressed by the following theorem Theorem (CDS is closed under complement). Suppose that f has a CDS protocol with randomness cost of ρ {\displaystyle \rho } bits, communication complexity of t {\displaystyle t} bits, and privacy and correctness errors δ = ϵ = 2 − k {\displaystyle \delta =\epsilon =2^{-k}} . Then ¬ f {\displaystyle \neg f} has a CDS scheme with similar privacy and correctness errors, and randomness and communication complexity of O ( k 3 ρ 2 t + k 3 ρ 3 ) {\displaystyle O(k^{3}\rho ^{2}t+k^{3}\rho ^{3})} . The cost of a CDS protocol is also closed under formula's, in the following sense. Consider two functions f 1 {\displaystyle f_{1}} and f 2 {\displaystyle f_{2}} . Then, the communication and randomness costs of f 1 ∧ f 2 {\displaystyle f_{1}\wedge f_{2}} as well as f 1 ∨ f 2 {\displaystyle f_{1}\vee f_{2}} are not much larger than the sum of the costs for f 1 {\displaystyle f_{1}} and f 2 {\displaystyle f_{2}} . See Applebaum et al. for a precise statement. == Upper and lower bounds on communication cost == Given a function f {\displaystyle f} we would like to understand the communication and randomness costs to implement f {\displaystyle f} in the CDS setting. Towards understanding this, protocols for implementing CDS have been developed (which give an upper bound on the cost) and lower bound strategies have been developed. For most functions, there is a large gap between the known upper and lower bound, so understanding the cost of CDS remains largely an open problem. This section presents some of what is known so far about the cost of CDS. === Secret sharing based upper bound === A subject with a close relationship to CDS is secret sharing. Secret sharing constructions provide an upper bound on the cost of CDS protocols. A secret sharing scheme encodes a secret, s {\displaystyle s} into a set of shares S 1 , . . . , S n {\displaystyle S_{1},...,S_{n}} . Associated to any secret sharing scheme is an access structure, which consists of a set of authorized sets A = A 1 , . . . , A k {\displaystyle {\mathcal {A}}={A_{1},...,A_{k}}} with A i ⊆ { S 1 , . . . , S n } {\displaystyle A_{i}\subseteq \{S_{1},...,S_{n}\}} . The authorized sets are those subsets of the A i {\displaystyle A_{i}} from which it is possible to recover the secret recorded into the scheme. A succinct way to describe an access structure is in terms of a function f A : { 0 , 1 } n → { 0 , 1 } {\displaystyle f_{\mathcal {A}}:\{0,1\}^{n}\rightarrow \{0,1\}} . Each subset of the shares K [ x ] ⊂ { S 1 , . . . , S n } {\displaystyle K[x]\subset \{S_{1},...,S_{n}\}} is labelled by a string x ∈ { 0 , 1 } n {\displaystyle x\in \{0,1\}^{n}} such that x i = 1 {\displaystyle x_{i}=1} if and only if S i ∈ K {\displaystyle S_{i}\in K} . Then we define f A {\displaystyle f_{\mathcal {A}}} to be such that f A ( x ) = 1 {\displaystyle f_{\mathcal {A}}(x)=1} if and only if K [ x ] ∈ A {\displaystyle K[x]\in {\mathcal {A}}} . In words, the function f A {\displaystyle f_{\mathcal {A}}} is 1 when given an authorized subset as input, and 0 otherwise. A basic result in the theory of secret sharing is that an access structure A {\displaystyle {\mathcal {A}}} can be realized in a secret sharing scheme if and only if f A {\displaystyle f_{\mathcal {A}}} is monotone. The size of a secret sharing scheme is defined as the total number of bits in the shares S i {\displaystyle S_{i}} . For monotone functions, there is an upper bound on the communication cost in CDS for any monotone function f {\displaystyle f} in terms of the size of any secret sharing scheme with access structure given by f {\displaystyle f} , C D S ϵ = 0 , δ = 0 ( f ) ≤ S h a r i n g S i z e ( f ) {\displaystyle CDS_{\epsilon =0,\delta =0}(f)\leq SharingSize(f)} For some concrete classes of secret sharing schemes, this relationship can be extended to general (non-monotone) Boolean functions. This leads to an upper bound on CDS cost in terms of the size of any span program that computes f {\displaystyle f} , C D S ϵ = 0 , δ = 0 ( f ) ≤ S P k ( f ) {\displaystyle CDS_{\epsilon =0,\delta =0}(f)\leq SP_{k}(f)} The class of problems with efficient (polynomial size) span program is the complexity class M o d k L {\displaystyle Mod_{k}L} , so problems in this class have efficient CDS protocols. === Sub-exponential upper bounds for all functions === Using a matching vector family based construction, it has been proven that ∀ f , C D S ϵ = 0 , δ = 0 ( f ) ≤ 2 O ( n log ⁡ n ) {\displaystyle \forall f,\,\,\,\,\,\,CDS_{\epsilon =0,\delta =0}(f)\leq 2^{O({\sqrt {n\log n}})}} . The technique for this proof is similar to one used to prove upper bounds on private information retrieval. This upper bound on CDS also leads to sub-exponential upper bounds on the size of a large class of secret sharing schemes. === Lower bounds from communication complexity === In a CDS protocol, the referee is given the inputs ( x , y ) {\displaystyle (x,y)} . This means it is not clear if the messages sent by Alice a

Data validation and reconciliation

Industrial process data validation and reconciliation, or more briefly, process data reconciliation (PDR), is a technology that uses process information and mathematical methods in order to automatically ensure data validation and reconciliation by correcting measurements in industrial processes. The use of PDR allows for extracting accurate and reliable information about the state of industry processes from raw measurement data and produces a single consistent set of data representing the most likely process operation. == Models, data and measurement errors == Industrial processes, for example chemical or thermodynamic processes in chemical plants, refineries, oil or gas production sites, or power plants, are often represented by two fundamental means: Models that express the general structure of the processes, Data that reflects the state of the processes at a given point in time. Models can have different levels of detail, for example one can incorporate simple mass or compound conservation balances, or more advanced thermodynamic models including energy conservation laws. Mathematically the model can be expressed by a nonlinear system of equations F ( y ) = 0 {\displaystyle F(y)=0\,} in the variables y = ( y 1 , … , y n ) {\displaystyle y=(y_{1},\ldots ,y_{n})} , which incorporates all the above-mentioned system constraints (for example the mass or heat balances around a unit). A variable could be the temperature or the pressure at a certain place in the plant. === Error types === Data originates typically from measurements taken at different places throughout the industrial site, for example temperature, pressure, volumetric flow rate measurements etc. To understand the basic principles of PDR, it is important to first recognize that plant measurements are never 100% correct, i.e. raw measurement y {\displaystyle y\,} is not a solution of the nonlinear system F ( y ) = 0 {\displaystyle F(y)=0\,\!} . When using measurements without correction to generate plant balances, it is common to have incoherencies. Measurement errors can be categorized into two basic types: random errors due to intrinsic sensor accuracy and systematic errors (or gross errors) due to sensor calibration or faulty data transmission. Random errors means that the measurement y {\displaystyle y\,\!} is a random variable with mean y ∗ {\displaystyle y^{}\,\!} , where y ∗ {\displaystyle y^{}\,\!} is the true value that is typically not known. A systematic error on the other hand is characterized by a measurement y {\displaystyle y\,\!} which is a random variable with mean y ¯ {\displaystyle {\bar {y}}\,\!} , which is not equal to the true value y ∗ {\displaystyle y^{}\,} . For ease in deriving and implementing an optimal estimation solution, and based on arguments that errors are the sum of many factors (so that the Central limit theorem has some effect), data reconciliation assumes these errors are normally distributed. Other sources of errors when calculating plant balances include process faults such as leaks, unmodeled heat losses, incorrect physical properties or other physical parameters used in equations, and incorrect structure such as unmodeled bypass lines. Other errors include unmodeled plant dynamics such as holdup changes, and other instabilities in plant operations that violate steady state (algebraic) models. Additional dynamic errors arise when measurements and samples are not taken at the same time, especially lab analyses. The normal practice of using time averages for the data input partly reduces the dynamic problems. However, that does not completely resolve timing inconsistencies for infrequently-sampled data like lab analyses. This use of average values, like a moving average, acts as a low-pass filter, so high frequency noise is mostly eliminated. The result is that, in practice, data reconciliation is mainly making adjustments to correct systematic errors like biases. === Necessity of removing measurement errors === ISA-95 is the international standard for the integration of enterprise and control systems It asserts that: Data reconciliation is a serious issue for enterprise-control integration. The data have to be valid to be useful for the enterprise system. The data must often be determined from physical measurements that have associated error factors. This must usually be converted into exact values for the enterprise system. This conversion may require manual, or intelligent reconciliation of the converted values [...]. Systems must be set up to ensure that accurate data are sent to production and from production. Inadvertent operator or clerical errors may result in too much production, too little production, the wrong production, incorrect inventory, or missing inventory. == History == PDR has become more and more important due to industrial processes that are becoming more and more complex. PDR started in the early 1960s with applications aiming at closing material balances in production processes where raw measurements were available for all variables. At the same time the problem of gross error identification and elimination has been presented. In the late 1960s and 1970s unmeasured variables were taken into account in the data reconciliation process., PDR also became more mature by considering general nonlinear equation systems coming from thermodynamic models., , Quasi steady state dynamics for filtering and simultaneous parameter estimation over time were introduced in 1977 by Stanley and Mah. Dynamic PDR was formulated as a nonlinear optimization problem by Liebman et al. in 1992. == Data reconciliation == Data reconciliation is a technique that targets at correcting measurement errors that are due to measurement noise, i.e. random errors. From a statistical point of view the main assumption is that no systematic errors exist in the set of measurements, since they may bias the reconciliation results and reduce the robustness of the reconciliation. Given n {\displaystyle n} measurements y i {\displaystyle y_{i}} , data reconciliation can mathematically be expressed as an optimization problem of the following form: min x , y ∗ ∑ i = 1 n ( y i ∗ − y i σ i ) 2 subject to F ( x , y ∗ ) = 0 y min ≤ y ∗ ≤ y max x min ≤ x ≤ x max , {\displaystyle {\begin{aligned}\min _{x,y^{}}&\sum _{i=1}^{n}\left({\frac {y_{i}^{}-y_{i}}{\sigma _{i}}}\right)^{2}\\{\text{subject to }}&F(x,y^{})=0\\&y_{\min }\leq y^{}\leq y_{\max }\\&x_{\min }\leq x\leq x_{\max },\end{aligned}}\,\!} where y i ∗ {\displaystyle y_{i}^{}\,\!} is the reconciled value of the i {\displaystyle i} -th measurement ( i = 1 , … , n {\displaystyle i=1,\ldots ,n\,\!} ), y i {\displaystyle y_{i}\,\!} is the measured value of the i {\displaystyle i} -th measurement ( i = 1 , … , n {\displaystyle i=1,\ldots ,n\,\!} ), x j {\displaystyle x_{j}\,\!} is the j {\displaystyle j} -th unmeasured variable ( j = 1 , … , m {\displaystyle j=1,\ldots ,m\,\!} ), and σ i {\displaystyle \sigma _{i}\,\!} is the standard deviation of the i {\displaystyle i} -th measurement ( i = 1 , … , n {\displaystyle i=1,\ldots ,n\,\!} ), F ( x , y ∗ ) = 0 {\displaystyle F(x,y^{})=0\,\!} are the p {\displaystyle p\,\!} process equality constraints and x min , x max , y min , y max {\displaystyle x_{\min },x_{\max },y_{\min },y_{\max }\,\!} are the bounds on the measured and unmeasured variables. The term ( y i ∗ − y i σ i ) 2 {\displaystyle \left({\frac {y_{i}^{}-y_{i}}{\sigma _{i}}}\right)^{2}\,\!} is called the penalty of measurement i. The objective function is the sum of the penalties, which will be denoted in the following by f ( y ∗ ) = ∑ i = 1 n ( y i ∗ − y i σ i ) 2 {\displaystyle f(y^{})=\sum _{i=1}^{n}\left({\frac {y_{i}^{}-y_{i}}{\sigma _{i}}}\right)^{2}} . In other words, one wants to minimize the overall correction (measured in the least squares term) that is needed in order to satisfy the system constraints. Additionally, each least squares term is weighted by the standard deviation of the corresponding measurement. The standard deviation is related to the accuracy of the measurement. For example, at a 95% confidence level, the standard deviation is about half the accuracy. === Redundancy === Data reconciliation relies strongly on the concept of redundancy to correct the measurements as little as possible in order to satisfy the process constraints. Here, redundancy is defined differently from redundancy in information theory. Instead, redundancy arises from combining sensor data with the model (algebraic constraints), sometimes more specifically called "spatial redundancy", "analytical redundancy", or "topological redundancy". Redundancy can be due to sensor redundancy, where sensors are duplicated in order to have more than one measurement of the same quantity. Redundancy also arises when a single variable can be estimated in several independent ways from separate sets of measurements at a given time or time averaging period, using the algebraic constraints. Redundancy is linked to the concept

Peñabot

Peñabot is the nickname for automated social media accounts allegedly used by the Mexican government of Enrique Peña Nieto and the PRI political party to keep unfavorable news from reaching the Mexican public. Peñabot accusations are related to the broader issue of fake news in the 21st century. == History of disinformation in Mexican politics == The PRI political party has been reported to use fake news since before Peña Nieto. The main tactic originally was to spread such propaganda through open radio and television networks. Such tactic was effective in Mexico, because newspaper readership is low and cable TV is largely limited to the middle classes; consequently, the country's two major television networks – Televisa and TV Azteca – exert a significant influence in national politics. Televisa itself, not only owns around two-thirds of the programming on Mexico's TV channels, making it not only Mexico's largest television network, but also is the largest media network in the Spanish-speaking world. == Peñabots == Analysts have given the name Peñabots to a suspected network of automated accounts on social media used by the Mexican government to spread pro-government propaganda and to marginalize dissenting opinions in social media. The bots were first noticed in the 2012 elections when they were used to disseminate opinions in support of Enrique Peña Nieto on social networks such as Twitter and Facebook. According to Aristegui Noticias, their usage went against articles 6 and 134 of the Mexican Constitution. Those used by Peña Nieto's government cost an estimated 80 million pesos monthly, which news outlets argued only helped the government spread fake support towards the president, but did not have a benefit towards Mexican people (with whom EPN was highly unpopular). Facebook held approximately 640,321 Peñabots, while Twitter had less. As of July 2017, Oxford Internet Institute's Computational Propaganda Research Project claimed many western democracies, Mexico included, perform social media manipulation, thus saying the manipulation comes directly from the Mexican government itself. During Peña Nieto's subsequent presidency, analysts noted that Peñabots were used to overpower trending topics that critiqued government, to flood trending government critical hashtags with spam, to create fake trends by pushing alternative hashtags, and to push smear campaigns and threats against government-critical activists and journalists. Peñabots were distinguished as their pattern of activity was distinct from that of ordinary interaction on social networks. === Meadebots === On Twitter it was reported that about 94% of the followers of 2018 presidential candidate from the PRI Jose Antonio Meade were bots. When Antonio Meade presented himself as a candidate for the 2018 presidential election, his social media accounts such as "@MovimientoMEADE" (created by the PRI's official account @PRI_Nacional), obtained a huge quantity of followers in a short span of time. Some users noticed and brought it to attention, and after investigation it was reported 94% of such followers were bots (702,000 out of 747,000), and the account was eliminated from Twitter after 20 hours. The fake accounts used the hashtags #YoConMeade and #Meade18. It was further revealed was that Meade's official account on Twitter, @JoseAMeadeK had 25% bots (216,000 fake followers out of the 981,000). == Manipulation of news media in Mexico, through television == The Mexican government of Peña Nieto has been accused of using various means to keep unfavorable news from reaching the Mexican people. Many Mexicans have protested this practice as it clearly goes against the freedom of speech. The PRI has been reported to use fake news since before Peña Nieto. The main tactic has been to spread such propaganda through radio and television. This tactic is perceived as effective in Mexico, because newspaper readership is low and research on the Internet and cable TV is largely limited to the middle classes; consequently, the country's two major television networks – Televisa and TV Azteca – exert a significant influence in national politics. Televisa itself, owns around two-thirds of the programming on Mexico's TV channels, making it not only Mexico's largest television network, but also is the largest media network in the Spanish-speaking world. In June 2012, before the 2012 Mexican presidential elections, the British newspaper The Guardian published a series of allegations claiming Televisa, sold favorable coverage to top politicians in its news and entertainment shows, this scandal became known as the Televisa controversy. The documents published by 'The Guardian alleged that a secretive circle within Televisa manipulated news coverage to favor PRI presidential candidate Enrique Peña Nieto, who was poised as favorite to win. Televisa's secret circle supposedly commissioned videos to promote Peña Nieto and lash out his political rivals in 2009. The Guardian documents suggest that Televisa's secret team distributed such videos through e-mail, posting them posted them on Facebook and YouTube, some can still be seen there. Another document was a PowerPoint presentation, with a slide explicitly aimed at rival leftist candidate of the Party of the Democratic Revolution (PRD), Andrés Manuel López Obrador. Supposedly given to The Guardian by a Televisa employee. The document's authenticity was never possible to confirm– however dates, names, and events largely coincide. Televisa refused to talk the documents, and denied a relationship with the PRI or its presidential candidate, saying that they had provided equal media coverage to all parties. Televisa published an article supposedly showing discrepancies in The Guardian documents and denying accusations. Mexican citizens complained about the perceived favoritism towards Enrique Peña Nieto and the PRI, protesting through the Yo Soy 132 movement which Televisa covered in detail. However, Televisa's news media coverage is perceived to have been biased, by using a media coverage tactic Mexican citizens call cortinas de humo (smoke screens). These introduce a news scandal giving extensive coverage to distract citizens from a potential conflict-of-interest or controversy that could damage the image of the politician favored by the network. An example of a perceived smoke screen would be the news media coverage of "Caso Michoacán" and "Caso Paolette" distracting all the attention from the parallel "Yo soy 132" movement. A few years later, on the day of September 11, 2016; factual evidence of Televisa's performing media manipulation emerged, when a Televisa news anchor while live-on air reading a teleprompter, mistakenly read out loud that "try that Jaime "Ël Bronco" Rodríguez Calderón (Nuevo Leon's governor) is mentioned as little as possible". Newspaper El Universal caught it on video and published it social media. Televisa didn't mention the story and declined to comment. Lack of news coverage concerning Nuevo León's Governor Jaime Rodriguez, is perceived due to him being the first elected governor to not be part of any political party (Independent Governor), and because unlike the governors from the PRI preceding him, the independent governor "El Bronco" doesn't spend money on publicity at all, preferring to communicate all news by using social media such as Twitter and Facebook. While the incident may have proven Televisa's bias, there wasn't anything to incriminate the PRI political party or Enrique Peña Nieto, though it did further suspicion of Televisa manipulating news media. In contrast, a December 2017 article of The New York Times, reported Enrique Peña Nieto spending about 2000 million dollars on publicity, during his first 5 years as president, the largest publicity budget ever spent by a Mexican President. Additionally, 68 percent of news journalists admitted to not believe to have enough freedom of speech, and award-winning news reporter Carmen Aristegui was controversially fired shortly after revealing the Mexican White House scandals. == Violence and spying towards news journalists and civil rights activists == Far for only being receiving accusations of spreading fake news, the Mexican government of EPN (Enrique Peña Nieto) has also been accused of violence towards news journalists, and of spying on them, and also towards civil right leaders and their families. During his tenure as president, Peña Nieto has been accused of failing to protect news journalists, whose deaths are speculated to be politically triggered, by politicians attempting to prevent them from covering political scandals. The New York Times published a news report on the matter titled, "In Mexico it's easy to kill a journalist", on it mentioning how during EPN's government, Mexico became one of the worst countries on which to be a journalist. The assassination of journalist Javier Valdez on May 23, 2017, received national coverage, with multiple news journalists

Colors!

Colors! is a series of digital painting applications for handheld game consoles and mobile devices. Originally created as a homebrew application for Nintendo DS (as Colors!), which was since legitimately distributed on PlayStation Vita, iOS, and Android, the project eventually evolved into an officially licensed application for Nintendo 3DS (as Colors! 3D) and Nintendo Switch (as Colors Live). == History == === Colors! === Colors! was originally released in June 2007 as a simple homebrew painting application for the Nintendo DS. It was developed by Jens Andersson, a programmer and designer on sabbatical from the games industry who wanted to experiment with the potential of the new handheld platform. Shortly after, Rafał Piasek created an online gallery where users could upload paintings made with the program. Colors! quickly became one of the best-known homebrew applications on the Nintendo DS, and in September 2008, it was also released for the iPhone and iPod Touch. As of August 2010, it had been downloaded almost half a million times. It was voted the most popular homebrew application on the Nintendo DS by readers of the R4 for DS blog. Development of Colors! DS homebrew officially ended in December 2010 although the official gallery still accepted submissions from DS users until 2020 when Colors! Gallery was discontinued. === Colors! 3D === Colors! 3D is a successor to the application Colors! for the Nintendo 3DS. It was released as an officially licensed application for the Nintendo eShop in North America on April 5, 2012, and in the PAL region on April 19, 2012. It was later released in Japan on August 21, 2013, published by Arc System Works. Colors! 3D allows users to draw on five layers, each on their own stereoscopic 3D plane. Drawing is done on the bottom screen, while the top screen displays the painting in 3D. While drawing, players can use the various controls on the Nintendo 3DS to change layers, zoom and pan, and alter the pressure of their brush. Pressing the L button allows users to access a menu to change brush type, size, and opacity, modify the layers, use the camera to provide references, and more. When the user finishes their painting, they can export it to the SD card for viewing in the Nintendo 3DS Camera application. Users can also upload their finished creations to an online gallery, viewed on the 3DS or the official website. Gallery features include hashtags and the ability to follow artists and post comments. Each painting also features a replay feature that allows viewers to see how it was drawn. The application also features local multiplayer, allowing several people to work cooperatively on a painting. In April 2024, the developers of Colors! 3D collaborated with the Pretendo Network project to officially add support for the application, meaning Colors! 3D will continue to operate as normal when using Pretendo Network. ==== Reception ==== IGN gave the application a score of 9.0 and an Editor's Choice award, praising its simple interface and tutorials. Destructoid gave the app a 9.0, calling it "a simple and incredibly fun tool with an amazing community of artists proudly displaying their beautiful and funny 3D images." Nintendo Life gave the app a 9/10, stating, "Though lacking in any structured play, Colors! 3D’s robust free drawing system and unique ability to let anyone create their own three-dimensional artwork more than make up for this." === Colors Live === A Nintendo Switch successor called Colors Live (stylised as Colors L!ve) was released in 2020 after being funded via a Kickstarter campaign. This expanded upon the features of previous installments by adding new brushes, increasing the maximum number of layers to ten, and introducing blend modes. A new game mode called Colors Quest was also included. A pressure-sensitive pen called the Colors SonarPen was developed in collaboration with GreenBulb to facilitate drawing on the Nintendo Switch, and comes pre-bundled with physical copies of the game. ==== Colors Quest ==== This new mode acts as a story-driven adventure wherein players are given a daily drawing challenge with a specific theme and certain stipulations that must be fulfilled. Once the drawing is complete, players must anonymously score other players' submissions, these scores are then aggregated to produce a personal ranking that measures the improvement in the player's art skills over time.

Cryptovirology

Cryptovirology refers to the study of cryptography use in malware, such as ransomware and asymmetric backdoors. Traditionally, cryptography and its applications are defensive in nature, and provide privacy, authentication, and security to users. Cryptovirology employs a twist on cryptography, showing that it can also be used offensively. It can be used to mount extortion based attacks that cause loss of access to information, loss of confidentiality, and information leakage, tasks which cryptography typically prevents. The field was born with the observation that public-key cryptography can be used to break the symmetry between what an antivirus analyst sees regarding malware and what the attacker sees. The antivirus analyst sees a public key contained in the malware, whereas the attacker sees the public key contained in the malware as well as the corresponding private key (outside the malware) since the attacker created the key pair for the attack. The public key allows the malware to perform trapdoor one-way operations on the victim's computer that only the attacker can undo. == Overview == The field encompasses covert malware attacks in which the attacker securely steals private information such as symmetric keys, private keys, PRNG state, and the victim's data. Examples of such covert attacks are asymmetric backdoors. An asymmetric backdoor is a backdoor (e.g., in a cryptosystem) that can be used only by the attacker, even after it is found. This contrasts with the traditional backdoor that is symmetric, i.e., anyone that finds it can use it. Kleptography, a subfield of cryptovirology, is the study of asymmetric backdoors in key generation algorithms, digital signature algorithms, key exchanges, pseudorandom number generators, encryption algorithms, and other cryptographic algorithms. The NIST Dual EC DRBG random bit generator has an asymmetric backdoor in it. The EC-DRBG algorithm utilizes the discrete-log kleptogram from kleptography, which by definition makes the EC-DRBG a cryptotrojan. Like ransomware, the EC-DRBG cryptotrojan contains and uses the attacker's public key to attack the host system. The cryptographer Ari Juels indicated that NSA effectively orchestrated a kleptographic attack on users of the Dual EC DRBG pseudorandom number generation algorithm and that, although security professionals and developers have been testing and implementing kleptographic attacks since 1996, "you would be hard-pressed to find one in actual use until now." Due to public outcry about this cryptovirology attack, NIST rescinded the EC-DRBG algorithm from the NIST SP 800-90 standard. Covert information leakage attacks carried out by cryptoviruses, cryptotrojans, and cryptoworms that, by definition, contain and use the public key of the attacker is a major theme in cryptovirology. In "deniable password snatching," a cryptovirus installs a cryptotrojan that asymmetrically encrypts host data and covertly broadcasts it. This makes it available to everyone, noticeable by no one (except the attacker), and only decipherable by the attacker. An attacker caught installing the cryptotrojan claims to be a virus victim. An attacker observed receiving the covert asymmetric broadcast is one of the thousands, if not millions of receivers, and exhibits no identifying information whatsoever. The cryptovirology attack achieves "end-to-end deniability." It is a covert asymmetric broadcast of the victim's data. Cryptovirology also encompasses the use of private information retrieval (PIR) to allow cryptoviruses to search for and steal host data without revealing the data searched for even when the cryptotrojan is under constant surveillance. By definition, such a cryptovirus carries within its own coding sequence the query of the attacker and the necessary PIR logic to apply the query to host systems. == History == The first cryptovirology attack and discussion of the concept was by Adam L. Young and Moti Yung, at the time called "cryptoviral extortion" and it was presented at the 1996 IEEE Security & Privacy conference. In this attack, a cryptovirus, cryptoworm, or cryptotrojan contains the public key of the attacker and hybrid encrypts the victim's files. The malware prompts the user to send the asymmetric ciphertext to the attacker who will decipher it and return the symmetric decryption key it contains for a fee. The victim needs the symmetric key to decrypt the encrypted files if there is no way to recover the original files (e.g., from backups). The 1996 IEEE paper predicted that cryptoviral extortion attackers would one day demand e-money, long before Bitcoin even existed. Many years later, the media relabeled cryptoviral extortion as ransomware. In 2016, cryptovirology attacks on healthcare providers reached epidemic levels, prompting the U.S. Department of Health and Human Services to issue a Fact Sheet on Ransomware and HIPAA. The fact sheet states that when electronic protected health information is encrypted by ransomware, a breach has occurred, and the attack therefore constitutes a disclosure that is not permitted under HIPAA, the rationale being that an adversary has taken control of the information. Sensitive data might never leave the victim organization, but the break-in may have allowed data to be sent out undetected. California enacted a law that defines the introduction of ransomware into a computer system with the intent of extortion as being against the law. == Examples == === Tremor virus === While viruses in the wild have used cryptography in the past, the only purpose of such usage of cryptography was to avoid detection by antivirus software. For example, the tremor virus used polymorphism as a defensive technique in an attempt to avoid detection by anti-virus software. Though cryptography does assist in such cases to enhance the longevity of a virus, the capabilities of cryptography are not used in the payload. The One-half virus was amongst the first viruses known to have encrypted affected files. === Tro_Ransom.A virus === An example of a virus that informs the owner of the infected machine to pay a ransom is the virus nicknamed Tro_Ransom.A. This virus asks the owner of the infected machine to send $10.99 to a given account through Western Union. Virus.Win32.Gpcode.ag is a classic cryptovirus. This virus partially uses a version of 660-bit RSA and encrypts files with many different extensions. It instructs the owner of the machine to email a given mail ID if the owner desires the decryptor. If contacted by email, the user will be asked to pay a certain amount as ransom in return for the decryptor. === CAPI === It has been demonstrated that using just 8 different calls to Microsoft's Cryptographic API (CAPI), a cryptovirus can satisfy all its encryption needs. == Other uses of cryptography-enabled malware == Apart from cryptoviral extortion, there are other potential uses of cryptoviruses, such as deniable password snatching, cryptocounters, private information retrieval, and in secure communication between different instances of a distributed cryptovirus.