Anna Aksenova,Anoop Johny,Tim Adams et al.
                                                Anna Aksenova et al.
                                            
                                            In today's data-centric landscape, effective data stewardship is critical for facilitating scientific research and innovation. This article provides an overview of essential tools and frameworks for modern data stewardship practices. Over 3...
                                            
                                        
                                                Navigating pathways to automated personality prediction: a comparative study of small and medium language models [0.03%]
                                                自动化人格预测路径探索:小型和中型语言模型的比较研究
                                               
                                            
                                            
                                                Fatima Habib,Zeeshan Ali,Akbar Azam et al.
                                                Fatima Habib et al.
                                            
                                            Introduction:                    Recent advancements in Natural Language Processing (NLP) and widely available social media data have made it possible to predict human personalities in various computational applications. In this context, pr...
                                            
                                        
                                                When we talk about Big Data, What do we really mean? Toward a more precise definition of Big Data [0.03%]
                                                当我们谈论大数据的时候,我们真正指的是什么?更加精确地定义大数据
                                               
                                            
                                            
                                                Xiaoyao Han,Oskar Josef Gstrein,Vasilios Andrikopoulos
                                                Xiaoyao Han
                                            
                                            Despite the lack of consensus on an official definition of Big Data, research and studies have continued to progress based on this "no consensus" stance over the years. However, the lack of a clear definition and scope for Big Data results ...
                                            
                                        
                                                SparkDWM: a scalable design of a Data Washing Machine using Apache Spark [0.03%]
                                                基于Apache Spark的数据清洗机可扩展设计:SparkDWM
                                               
                                            
                                            
                                                Nicholas Kofi Akortia Hagan,John R Talburt
                                                Nicholas Kofi Akortia Hagan
                                            
                                            Data volume has been one of the fast-growing assets of most real-world applications. This increases the rate of human errors such as duplication of records, misspellings, and erroneous transpositions, among other data quality issues. Entity...
                                            
                                        
                                                Deepfake: definitions, performance metrics and standards, datasets, and a meta-review [0.03%]
                                                深度伪造:定义、性能评估指标与标准、数据集和元综述
                                               
                                            
                                            
                                                Enes Altuncu,Virginia N L Franqueira,Shujun Li
                                                Enes Altuncu
                                            
                                            Recent advancements in AI, especially deep learning, have contributed to a significant increase in the creation of new realistic-looking synthetic media (video, image, and audio) and manipulation of existing media, which has led to the crea...
                                            
                                        
                                                Charles X Ling,Ganyu Wang,Boyu Wang
                                                Charles X Ling
                                            
                                            Introduction:                    Recently, Google introduced Pathways as its next-generation AI architecture. Pathways must address three critical challenges: learning one general model for several continuous tasks, ensuring tasks can lever...
                                            
                                        
                                                Efficient use of binned data for imputing univariate time series data [0.03%]
                                                箱型离散数据的有效利用以进行单变量时间序列插补
                                               
                                            
                                            
                                                Jay Darji,Nupur Biswas,Vijay Padul et al.
                                                Jay Darji et al.
                                            
                                            Time series data are recorded in various sectors, resulting in a large amount of data. However, the continuity of these data is often interrupted, resulting in periods of missing data. Several algorithms are used to impute the missing data,...
                                            
                                        
                                                Equitable differential privacy [0.03%]
                                                公平差分隐私
                                               
                                            
                                            
                                                Vasundhara Kaul,Tamalika Mukherjee
                                                Vasundhara Kaul
                                            
                                            Differential privacy (DP) has been in the public spotlight since the announcement of its use in the 2020 U.S. Census. While DP algorithms have substantially improved the confidentiality protections provided to Census respondents, concerns h...
                                            
                                        
                                                Data science's cultural construction: qualitative ideas for quantitative work [0.03%]
                                                数据科学的文化建构:定性思想与定量工作
                                               
                                            
                                            
                                                Philipp Brandt
                                                Philipp Brandt
                                            
                                            Introduction:                    "Data scientists" quickly became ubiquitous, often infamously so, but they have struggled with the ambiguity of their novel role. This article studies data science's collective definition on Twitter.        ...
                                            
                                        
                                                The development and application of a novel E-commerce recommendation system used in electric power B2B sector [0.03%]
                                                一种新型电子商务推荐系统在电力b2b领域的应用与发展研究
                                               
                                            
                                            
                                                Wenjun Meng,Lili Chen,Zhaomin Dong
                                                Wenjun Meng
                                            
                                            The advent of the digital era has transformed E-commerce platforms into critical tools for industry, yet traditional recommendation systems often fall short in the specialized context of the electric power industry. These systems typically ...